Issue
I have two numpy arrays (one containing dataframe indices, the other containing floats) and am using a for loop to find the row index of the first value in df which:
has an index greater than the t_index; and
the value in the Low column is less than or equal to low.
It works but needs to be optimised. Is it possible to vectorize it to speed up its performance?
for t_index, low in zip(t_indices, t_lows):
t_ix = df.loc[t_index:][(df["Low"] <= low)].index[0]
I've not included sample data as I don't think it's helpful here however let me know if this is not the case.
Solution
You can broadcast the comparisons using:
tmp = np.where(df.index.to_numpy()[:,None]>=t_indices,
df['Low'].to_numpy()[:,None] <= t_lows,
-1
) == 1
out = np.where(tmp.any(axis=0), tmp.argmax(axis=0), np.nan)
However, this will be using O(n²) of memory.
Example:
t_indices = np.array([0,2,1,2,0])
t_lows = np.array([0.2,0.13,0.15,0.17,0.02])
df = pd.DataFrame({'Low': [0.1,0.12,0.11,0.17,1]})
Output:
array([ 0., 2., 1., 2., nan])
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.