Issue
I have a numpy array with keys (e.g. [1, 2, 2, 3, 3, 2]
) and an array with values (e.g. [0.2, 0.6, 0.8, 0.4, 0.9, 0.3]
). I want to find the minimum value associated with each unique key without using a for loop. In this example, the answer is {1: 0.2, 2: 0.3, 3: 0.4}
. I asked ChatGPT and New Bing but they keep giving me the wrong answer. So, is it really possible to do this without a for loop?
Edit 1: What I'm trying to achieve is the fastest speed. Also, in my case, most keys are unique. I considered using np.unique
to acquire every key and then compute the min value for every key, but clearly it requires a for loop and a quadratic time. I also considered sorting the arrays by keys and apply np.min
on the values of each key, but I doubt its efficiency when most keys are unique. Additionally, according to the comments, pandas.DataFrame
has a groupby
method which might be helpful, but I'm not sure if it's the fastest (perhaps I'm going to try on my own).
Edit 2: I don't necessarily need a dict
as the output; it can be an array of unique keys and an array of min values, and the order of keys doesn't matter.
Solution
The naive, python solution is something like:
result = {}
for key, value in zip(keys, values):
current = result.get(key)
if current is not None:
result[key] = min(current, value)
else:
result[key] = value
It should be relatively fast.
If you really need to squeeze performance out of this, you should use numba
import numba
@numba.jit(nopython=True)
def group_min(keys, values):
result = {}
for key, value in zip(keys, values):
current = result.get(key)
if current is not None:
result[key] = min(current, value)
else:
result[key] = value
return result
Make sure to read over the numba docs to understand how to squeeze as much performance as you can out of it.
Answered By - juanpa.arrivillaga
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.