Issue
Wondering if there's a fast recipe for the following:
selected = labels == l
smallest_within_l = huge_array[labels].argmin()
# Find the index of smallest_within_l in the original array?
where, for example, labels
is of shape (1e5,), huge_array
is of shape (1e5,), selected
is the boolean mask of shape (1e5,) that selects n << 1e5 from huge_array
.
Considering we're looping over the entire array internally for labels == l
, it seems like we should be able to construct an inverse mapping at the same time. The only way I can think of is:
original_ind = (np.cumsum(selected) > smallest_within_l).argmax()
which requires looping over the 1e5 3x.
Is there a nice O(1) method for this inverse mapping?
Edit: Here's some code to give a concrete example:
labels = np.array([0, 0, 1, 1, 1, 1, 2, 2, 3, 3, 3])
huge_array = np.array([0.89, -1.63, 0.04, 0.1, 0.44, 2.01, -0.9, 0.83, 0.14, -1.05, 0.56])
# Boolean mask to labels of 2, corresponds to [-0.9, 0.83] in huge_array
selected = labels == 2
# Is index 0, since it's from [-0.9, 0.83]
smallest_within_l = huge_array[selected].argmin()
# Is index 6, which corresponds to -0.9 in the original array
original_ind = (np.cumsum(selected) > smallest_within_l).argmax()
print(huge_array[original_ind])
Solution
This may be a case where using np.where
is preferable to boolean masking. For example,
selected_where, = np.where(labels == 2) # [6, 7]
original_ind = selected_where[huge_array[selected_where].argmin()] # 6
huge_array[original_ind] # -0.9
Answered By - hilberts_drinking_problem
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.