Issue
My goal is two create a function that takes in two input arrays and outputs ranks withing groups. More specifically, the first input to the function is an array of scores (scores
, and the second input array is an array of group sizes (groups
, in such a way that the sum of the group sizes equals the total amount of scores, and the scores array can be partitioned into groups doing something like np.split(scores, np.cumsum(groups))
. The output ranking should be 1 for the point with the highest score within its group, two for the second highest, and so on. I have implemented this as follows (including two test cases):
import numpy as np
def assign_ranks(scores, groups):
# Create an array to store the ranks
ranks = np.zeros(len(scores), dtype=int)
# Iterate through groups
start = 0
for group_size in groups:
end = start + group_size
ranks[start:end] = (-scores[start:end]).argsort().argsort() + 1
start = end
return ranks
def test():
# Test case 1
scores1 = np.array([0.1, 0.5, 0.3, 0.4, 0.5])
groups1 = np.array([3, 2])
result1 = assign_ranks(scores1, groups1)
print("Test case 1 result:", result1) # Expected: [3 1 2 2 1]
assert np.all(result1 == np.array([3, 1, 2, 2, 1]))
# Test case 2
scores2 = np.array([0.7, 0.3, 0.6, 0.1, 0.2, 0.5, 0.8, 0.9, 0.4])
groups2 = np.array([4, 3, 2])
result2 = assign_ranks(scores2, groups2)
print("Test case 2 result:", result2) # Expected: [1 3 2 4 3 2 1 1 2]
assert np.all(result2 == np.array([1, 3, 2, 4, 3, 2, 1, 1, 2]))
if __name__ == "__main__":
test()
While this works correctly, I was wondering if it would be possible to implement this without using for-loops, potentially improving performance. I have made several attemps with no success. Any suggestions?
I have tried other approaches, leveraging np.lexsort
and np.split
, for example. Nothing worked.
Solution
If all your values are positive you can offset them by a base value corresponding to a multiple of the group index they are in. Then use argsort to obtain the global positions of items with the offsets. This will keep the members of a given group together relative to the group index. Then convert the global indexes back into group relative positions and invert the order to obtain ranks:
def assign_ranks(scores,groups):
offsets = np.repeat(np.arange(groups.size),groups) * np.max(scores)
order = np.argsort(scores + offsets)
groupBase = np.repeat(np.cumsum(groups),groups)
return groupBase - order
If you have negative values, you'll need to compute a larger offset (than * np.max(scores)
) to ensure that there is no overlap across groups in the resulting values.
You could do this using * 2 * np.max(np.abs(scores))
or * (np.max(scores) - np.min(scores))
Answered By - Alain T.
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.