Issue
Consider these two 2D arrays with same shapes:
arr1 = np.array([[0, 0, 4, 7, 3, 0, 0, 0, 0, 0],
[0, 3, 5, 7, 6, 0, 3, 0, 0, 0]])
arr2 = np.array([[14, 14, 14, 13, 11, 9, 6, 4, 2, 0],
[14, 13, 13, 13, 12, 9, 7, 4, 2, 0]])
I'm trying to select, for each row in arr1, one value and its index along axis 1 that satisfy the following conditions:
- Maximizes the value's match in arr2 along axis 1
- Non zero
- Maximizes value in arr1.
For the example above, that would give:
- Row 1: max in arr2 is 14. That gives 0, 0, 4 as candidate values in arr1. 4 is chosen as the max / only non zero value in candidates. Its index along axis 1 is 2, so the output is 4, 2.
- Row 2: max in arr2 is 14 but it only matches one 0 in arr1. Second highest value in arr2 is 13, matching candidate values in arr1 3, 5, 7. 7 is chosen as the max in candidates. Its index is 3, therefore output is 7, 3.
In case of several identical candidates, I'm comfortable with getting any of them.
In summary:
fancy_select(arr1, arr2) == np.array([[4, 2],
[7, 3]])
A loopy solution would be easy to write, but I would like to vectorize it, as it will run in loops with approximately 500k rows in each iteration.
I've tried several approaches based on sorting, including tiling arr1 into a 3D array to apply different sortings along axes, but I'm now standing at a point where I need your help.
Solution
I do the following, assuming arr2
, as in your example, is sorted. Perhaps there exists much easier way, IDK. This code need to be checked further, but can be used by some modifications if needed, and now is get the expected result on your example. I tried using masks and finding the first True
in each row frequently and add some explanation on the code to see how it works:
mask = arr1 != 0
# [[False False True True True False False False False False]
# [False True True True True False True False False False]]
mask_val = np.cumsum(mask, axis=1).cumsum(axis=1) == 1 # for finding 14 and 13 (the first True in each row)
# [[False False True False False False False False False False]
# [False True False False False False False False False False]]
mask_max_nonzero = arr2 == arr2[mask_val][:, None]
# [[ True True True False False False False False False False]
# [False True True True False False False False False False]]
sort_ = arr1.argsort()[:, ::-1]
mask_final = np.take_along_axis(mask_max_nonzero, sort_, axis=1) # for finding 4 and 7 (the first True in each row)
# [[False True False False False False False False True True]
# [ True False True False True False False False False False]]
mask_val = np.cumsum(mask_final, axis=1).cumsum(axis=1) == 1
# [[False True False False False False False False False False]
# [ True False False False False False False False False False]]
sorted_arr1 = np.take_along_axis(arr1, sort_, axis=1)
# [[7 4 3 0 0 0 0 0 0 0]
# [7 6 5 3 3 0 0 0 0 0]]
vals = sorted_arr1[sorted_arr1 == sorted_arr1[mask_val][:, None]] # ==> [4 7]
ind_mask = arr1 == vals[:, None]
# [[False False True False False False False False False False]
# [False False False True False False False False False False]]
indices = np.where(ind_mask & mask_max_nonzero)
# (array([0, 1], dtype=int64), array([2, 3], dtype=int64))
answer = np.dstack((vals, indices[1])).squeeze()
# [[4 2]
# [7 3]]
Some conditions must be considered by OP e.g. how to handle it if there exists two same max value in a row or what if there not be any nonzero for the max places, … .
Answered By - Ali_Sh
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.