Tuesday, June 28, 2022

[FIXED] Numpy - Vectorized binning and value sorting

June 28, 2022 numpy, python, vectorization No comments

Issue

Consider these two 2D arrays with same shapes:

arr1 = np.array([[0, 0, 4, 7, 3, 0, 0, 0, 0, 0],
                 [0, 3, 5, 7, 6, 0, 3, 0, 0, 0]])
arr2 = np.array([[14, 14, 14, 13, 11, 9, 6, 4, 2, 0],
                 [14, 13, 13, 13, 12, 9, 7, 4, 2, 0]])

I'm trying to select, for each row in arr1, one value and its index along axis 1 that satisfy the following conditions:

Maximizes the value's match in arr2 along axis 1
Non zero
Maximizes value in arr1.

For the example above, that would give:

Row 1: max in arr2 is 14. That gives 0, 0, 4 as candidate values in arr1. 4 is chosen as the max / only non zero value in candidates. Its index along axis 1 is 2, so the output is 4, 2.
Row 2: max in arr2 is 14 but it only matches one 0 in arr1. Second highest value in arr2 is 13, matching candidate values in arr1 3, 5, 7. 7 is chosen as the max in candidates. Its index is 3, therefore output is 7, 3.

In case of several identical candidates, I'm comfortable with getting any of them.

In summary:

fancy_select(arr1, arr2) == np.array([[4, 2], 
                                      [7, 3]])

A loopy solution would be easy to write, but I would like to vectorize it, as it will run in loops with approximately 500k rows in each iteration.

I've tried several approaches based on sorting, including tiling arr1 into a 3D array to apply different sortings along axes, but I'm now standing at a point where I need your help.

Solution

I do the following, assuming arr2, as in your example, is sorted. Perhaps there exists much easier way, IDK. This code need to be checked further, but can be used by some modifications if needed, and now is get the expected result on your example. I tried using masks and finding the first True in each row frequently and add some explanation on the code to see how it works:

mask = arr1 != 0
# [[False False  True  True  True False False False False False]
#  [False  True  True  True  True False  True False False False]]

mask_val = np.cumsum(mask, axis=1).cumsum(axis=1) == 1   # for finding 14 and 13 (the first True in each row)
# [[False False  True False False False False False False False]
#  [False  True False False False False False False False False]]

mask_max_nonzero = arr2 == arr2[mask_val][:, None]
# [[ True  True  True False False False False False False False]
#  [False  True  True  True False False False False False False]]

sort_ = arr1.argsort()[:, ::-1]
mask_final = np.take_along_axis(mask_max_nonzero, sort_, axis=1)  # for finding 4 and 7 (the first True in each row)
# [[False  True False False False False False False  True  True]
#  [ True False  True False  True False False False False False]]

mask_val = np.cumsum(mask_final, axis=1).cumsum(axis=1) == 1
# [[False  True False False False False False False False False]
#  [ True False False False False False False False False False]]

sorted_arr1 = np.take_along_axis(arr1, sort_, axis=1)
# [[7 4 3 0 0 0 0 0 0 0]
#  [7 6 5 3 3 0 0 0 0 0]]

vals = sorted_arr1[sorted_arr1 == sorted_arr1[mask_val][:, None]]  # ==> [4 7]

ind_mask = arr1 == vals[:, None]
# [[False False  True False False False False False False False]
#  [False False False  True False False False False False False]]

indices = np.where(ind_mask & mask_max_nonzero)
# (array([0, 1], dtype=int64), array([2, 3], dtype=int64))

answer = np.dstack((vals, indices[1])).squeeze()
# [[4 2]
#  [7 3]]

Some conditions must be considered by OP e.g. how to handle it if there exists two same max value in a row or what if there not be any nonzero for the max places, … .

Answered By - Ali_Sh

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, June 28, 2022

[FIXED] Numpy - Vectorized binning and value sorting

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels