Issue
Here's a sample of the numpy array I have:
y = np.array([
[ 0],
[ 0],
[ 2],
[ 1],
[ 0],
[ 1],
[ 3],
[-1],
])
I'm attempting to generate a new column containing the cumulative counts with respect to each value in the input array:
y = np.array([
[ 0, 1],
[ 0, 2],
[ 2, 1],
[ 1, 1],
[ 0, 3],
[ 1, 2],
[ 3, 1],
[-1, 1],
])
So far I've been using the following pandas implementation to solve this problem:
y_pd = pd.DataFrame(y, columns=['LABEL'])
y_pd = pd.concat([
y_pd,
y_pd.groupby('LABEL').cumcount().to_frame().rename(columns = {0:'cumcounts'}) +1
], axis=1)
Although I'm looking towards a numpy implementation instead. Here's my numpy implementation of the same problem:
y_np = np.hstack([y, y])
for label in np.unique(y_np):
slice_length = (y_np[:, -2]==label).sum()
y_np[y_np[:, -2]==label, -1] = range(1, slice_length+1)
Yet I'm feeling this aggregation using the for loop can be carried out with a faster vectorized implementation.
I've already checked the following links on SO to try solving this problem, with no success:
- Is there any numpy group by function?
- Numpy array: group by one column, sum another
- Vectorized groupby with NumPy
- numpy group by, returning original indexes sorted by the result
Could you provide any help in this regard?
Note: The numpy array I have is actually much bigger in terms of cardinality and number of fields, the order of the records should not be altered during the process.
Solution
How to use numpy to get the cumulative count by unique values in linear time? looked like what you are looking for.
Below is what its fastest method from the most upvoted answer with time testing yields in your case, after adapting it to the requested 2D horizontally stacked format (not the accepted answer, which includes no time study, has received negative criticism and should be maybe verified again).
cumcount(np.ravel(y))+1
returns the expected cumulative value count array([1, 2, 1, 1, 3, 2, 1, 1])
Reshaped:
np.hstack((y,
np.atleast_2d(cumcount(np.ravel(y))+1).T))
array([[ 0, 1],
[ 0, 2],
[ 2, 1],
[ 1, 1],
[ 0, 3],
[ 1, 2],
[ 3, 1],
[-1, 1]])
Answered By - OCa
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.