Issue
I use groupby on a dataframe based on the columns I want and then I have to take the index of each item in its group. By index I mean, if there are 10 items in a group, the index goes from 0 to 9, not the dataframe index.
My code for doing this is below:
import pandas as pd
df = pd.DataFrame({'A': np.random.randint(0, 11, 10 ** 3), 'B': np.random.randint(0, 11, 10 ** 3),
'C': np.random.randint(0, 11, 10 ** 3), 'D': np.random.randint(0, 2, 10 ** 3)})
grouped_by = df.groupby(["A", "B", "C"])
groups = dict(list(grouped_by))
index_dict = {k: v.index.tolist() for k,v in groups.items()}
df["POS"] = df.apply(lambda x: index_dict[(x["A"], x["B"], x["C"])].index(x.name), axis=1)
The dataframe here is just an example.
Is there a way to use the grouped_by
to achieve this ?
Solution
Here's a solution using cumcount()
on a dummy variable to generate a item index for each group. It should be significantly faster too.
In [122]: df['dummy'] = 0
...: df["POS"] = df.groupby(['A','B','C'])['dummy'].cumcount()
...: df = df.drop('dummy', axis=1)
As @unutbu noted, even cleaner just to use:
df["POS"] = df.groupby(['A','B','C']).cumcount()
Answered By - chrisb
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.