Issue
For example, given matrix
array([[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[ 0, 1, 2, 3, 4, 5],
[24, 25, 26, 27, 28, 29]])
and top_n=3, it should return
array([[24, 25, 26, 27, 28, 29],
[18, 19, 20, 21, 22, 23],
[12, 13, 14, 15, 16, 17]])
This function should return a np.ndarray of shape (top_n, arr.shape[-1]), given the input 2D matrix arr.
Here's what I tried:
def select_rows(arr, top_n):
"""
This function selects the top_n rows that have the largest sum of entries
"""
sel_rows = np.argsort(-arr,axis=1)[:top_n]
return sel_rows
I also tried:
sel_rows = (-arr).argsort(axis=-1)[:, :top_n]
to no avail.
Solution
You can use this simple 1-liner a[np.argsort(a.sum(axis=1))[:-top_n-1:-1]]
a.sum(axis=1)
sums along axis 1
np.argsort(..., axis=0)
argsorts along axis 0 (axis=0
is default option anyway so could be omitted)
...[:-top_n-1:-1]
picks the last top_n
indices in reverse order
a[...]
then grabs the rows
%%timeit
comparison
# data sample
a = np.random.randint(0, 101, (100000, 1000))
%%timeit
a[np.argsort(a.sum(axis=1))[:-3-1:-1]]
[out]:
9.73 ms ± 122 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
a[np.argsort(-a.sum(axis=1))[:3]]
[out]:
9.9 ms ± 303 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
sorted(a, key=lambda x: sum(x))[:-3-1:-1]
[out]:
1.04 s ± 36.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Answered By - Julien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.