Issue
I need to find first indices of the N largest values of a numpy array without sorting in-place. Examples:
1) Find indices of 2 largest values in an array: [0, 0.5, 1, 0.5]
result: [1,2] (or [2,1], order of returned indices doesn't matter])
2) Find indices of 2 largest values in an array: [0.5, 0.5, 0, 0.5, 0.5]
result: [0,1]
3) Find indices of 3 largest values in an array:[0.3, 0.9, 0.6, 0.75, 0.55, 0.75, 0.75, 0.8, 0.5]
result: [1, 7, 3]
4) Find indices of 6 largest values in an array:[0.3, 0.9, 0.6, 0.75, 0.55, 0.75, 0.75, 0.8, 0.5]
result: [1, 7, 3, 5, 6, 2]
If the smallest of the numbers to be returned isn't unique, it's important the function returns the indices of ones that are the closest to the beginning of the array.
I tried to do it this way:
example = np.array([0.5, 0.5, 0., 0.5, 0.5])
ind = np.argpartition(example, -N)[-N:]
but for N=2 this returns ind = [1,4]. There is an order parameter in np.argpartition() but I don't understand how I should use it.
90% of the time the array size will be around 25, 9% of time the size will be no greater than 100, edge case is limited to 1000 elements.
Because the array size will always be qutie small, I built a naive solution that works but it's surely sub-optimal:
import numpy as np
def nlar_idx(array, n):
d = dict()
for value, key in enumerate(array):
if key not in d.keys():
d[key] = [value]
else:
d[key].append(value)
do = dict(sorted(d.items(), reverse=True))
out = [x for v in do.values() for x in v][:n]
return out
array = np.array([0.3, 0.9, 0.6, 0.75, 0.55, 0.75, 0.75, 0.8, 0.5])
n = 3
res = nlar_idx(array, n)
print(res)
# [1, 7, 3]
Any way to make it work with argpartition() or build a better version of my function? We want to optimize for execution time only, memory doesn't matter.
Solution
You should use np.argsort
, but on the opposite of the list, because if there are multiples, argsort returns last occurence, and not the first. Here you go :
import numpy as np
examples = [
[2, [0, 0.5, 1, 0.5]],
[2, [0.5, 0.5, 0, 0.5, 0.5]],
[3, [0.3, 0.9, 0.6, 0.75, 0.55, 0.75, 0.75, 0.8, 0.5]],
[6, [0.3, 0.9, 0.6, 0.75, 0.55, 0.75, 0.75, 0.8, 0.5]]
]
def compute(n_max, data):
data = -1*np.array(data)
return np.argsort(data)[:n_max]
for ex in examples:
print(compute(ex[0], ex[1]))
And the result is :
[2 1]
[0 1]
[1 7 3]
[1 7 3 5 6 2]
Answered By - Tricotou
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.