Saturday, November 25, 2023

[FIXED] Find first indices of the N largest values of a numpy array without sorting in-place

November 25, 2023 numpy, python No comments

Issue

I need to find first indices of the N largest values of a numpy array without sorting in-place. Examples:

1) Find indices of 2 largest values in an array: [0, 0.5, 1, 0.5]
   result: [1,2] (or [2,1], order of returned indices doesn't matter])

2) Find indices of 2 largest values in an array: [0.5, 0.5, 0, 0.5, 0.5]
   result: [0,1]

3) Find indices of 3 largest values in an array:[0.3, 0.9, 0.6, 0.75, 0.55, 0.75, 0.75, 0.8, 0.5]
   result: [1, 7, 3]

4) Find indices of 6 largest values in an array:[0.3, 0.9, 0.6, 0.75, 0.55, 0.75, 0.75, 0.8, 0.5]
    result: [1, 7, 3, 5, 6, 2]

If the smallest of the numbers to be returned isn't unique, it's important the function returns the indices of ones that are the closest to the beginning of the array.

I tried to do it this way:

example = np.array([0.5, 0.5, 0.,  0.5, 0.5])
ind = np.argpartition(example, -N)[-N:]

but for N=2 this returns ind = [1,4]. There is an order parameter in np.argpartition() but I don't understand how I should use it.

90% of the time the array size will be around 25, 9% of time the size will be no greater than 100, edge case is limited to 1000 elements.

Because the array size will always be qutie small, I built a naive solution that works but it's surely sub-optimal:

import numpy as np

def nlar_idx(array, n):
    d = dict()
    for value, key in enumerate(array):
        if key not in d.keys():
            d[key] = [value]
        else:
            d[key].append(value)   
    do = dict(sorted(d.items(), reverse=True))
    out = [x for v in do.values() for x in v][:n]
    return out

array = np.array([0.3, 0.9, 0.6, 0.75, 0.55, 0.75, 0.75, 0.8, 0.5])
n = 3
res = nlar_idx(array, n)
print(res)
# [1, 7, 3]

Any way to make it work with argpartition() or build a better version of my function? We want to optimize for execution time only, memory doesn't matter.

Solution

You should use np.argsort, but on the opposite of the list, because if there are multiples, argsort returns last occurence, and not the first. Here you go :

import numpy as np

examples = [
    [2, [0, 0.5, 1, 0.5]],
    [2, [0.5, 0.5, 0, 0.5, 0.5]],
    [3, [0.3, 0.9, 0.6, 0.75, 0.55, 0.75, 0.75, 0.8, 0.5]],
    [6, [0.3, 0.9, 0.6, 0.75, 0.55, 0.75, 0.75, 0.8, 0.5]]
]

def compute(n_max, data):
    data = -1*np.array(data)
    return np.argsort(data)[:n_max]

for ex in examples:
    print(compute(ex[0], ex[1]))

And the result is :

[2 1]
[0 1]
[1 7 3]
[1 7 3 5 6 2]

Answered By - Tricotou

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, November 25, 2023

[FIXED] Find first indices of the N largest values of a numpy array without sorting in-place

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels