Issue
I have a DataFrame that looks the following:
t = {1: ['A','B'], 2: ['D','F'], 3: ['A','C'], 4: ['B','E'], 5: [‘B’,’B’], 6: ['D','D'], 7: ['A','H']}
df = pd.DataFrame.from_dict(t,orient='index',columns=['X','Y'])
df
X Y
1 A B
2 D F
3 A C
4 B E
5 B B
6 D D
7 A H
I then have a dictionary
d = {‘A’: 2, ‘B’: 1, ‘D’: 4}
What I would like to do is to drop the rows in my dataframe corresponding to the nth occurence of the value in the X column, where n is greater than the integer specified in my dictionary for that particular value, while preserving the order of the rows of my DataFrame. So the result of my operation with the above dictionary should be the DataFrame that looks like
X Y
1 A B
2 D F
3 A C
4 B E
6 D D
whereas with the dictionary
d = {‘A’: 1, ‘B’: 2, ‘D’: 1}
it should look like
X Y
1 A B
2 D F
4 B E
5 B B
Solution
You can use groupby.cumcount to enumerate the rows, then compare to the threshold with a map
mask = df.groupby('X').cumcount() < df['X'].map(d)
df[mask]
Answered By - Quang Hoang
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.