Issue
I have a two dimensional numpy array where some rows may have nans. I want to select the occurrence or absence of nans in rows of these arrays as per the following prescription:
- If a row does not start with a nan, then the result for that array will be -1.
- If a row starts with a nan, then the result will be the index of the last nan in the continuous unbroken sequence of nans which started at the beginning of that row.
What is the most optimal way of doing this? In my actual work, I will be dealing with numpy arrays with millions of rows.
As an example lets consider the below array:
import numpy as np
arr = np.array([[1,11,np.nan,111,1111],
[np.nan, np.nan, np.nan, 2, 22],
[np.nan, np.nan, 3, 33, np.nan],
[4, np.nan, np.nan, 44, 444],
[np.nan, 5, 55, np.nan, 555],
[np.nan, np.nan, np.nan, np.nan, np.nan]])
Here the expected result will be result = [-1, 2, 1, -1, 0, 4]
.
Below is a successful code that I have tried. But, I would like a more optimal solution.
result = []
for i in range(arr.shape[0]):
if np.isnan(arr[i])[0] == False:
result += [-1]
elif np.all(np.isnan(arr[i])):
result += [arr.shape[1]-1]
else:
result += [np.where(np.isnan(arr[i]) == False)[0][0] - 1]
Solution
You can add a column of non-nan with hstack
, check which values are nan with isnan
and get the position of the first non-nan with argmin
:
out = np.isnan(np.hstack([arr, np.ones((arr.shape[0], 1))])).argmin(axis=1)-1
Or without concatenation and using where
to fix the case in which all
values are nan:
tmp = np.isnan(arr)
out = np.where(tmp.all(axis=1), arr.shape[1], tmp.argmin(axis=1))-1
Output:
out = array([-1, 2, 1, -1, 0, 4])
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.