Sunday, September 4, 2022

[FIXED] Pandas drop row if column value has appeared more than some number of times depending on the value

September 04, 2022 pandas, python No comments

Issue

I have a DataFrame that looks the following:

t = {1: ['A','B'], 2: ['D','F'], 3: ['A','C'], 4: ['B','E'], 5: [‘B’,’B’], 6: ['D','D'], 7: ['A','H']}
df = pd.DataFrame.from_dict(t,orient='index',columns=['X','Y'])
df

   X  Y
1  A  B
2  D  F
3  A  C
4  B  E
5  B  B
6  D  D
7  A  H

I then have a dictionary

d = {‘A’: 2, ‘B’: 1, ‘D’: 4}

What I would like to do is to drop the rows in my dataframe corresponding to the nth occurence of the value in the X column, where n is greater than the integer specified in my dictionary for that particular value, while preserving the order of the rows of my DataFrame. So the result of my operation with the above dictionary should be the DataFrame that looks like

   X  Y
1  A  B
2  D  F
3  A  C
4  B  E
6  D  D

whereas with the dictionary

d = {‘A’: 1, ‘B’: 2, ‘D’: 1}

it should look like

   X  Y
1  A  B
2  D  F
4  B  E
5  B  B

Solution

You can use groupby.cumcount to enumerate the rows, then compare to the threshold with a map

mask = df.groupby('X').cumcount() < df['X'].map(d)

df[mask]

Answered By - Quang Hoang

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, September 4, 2022

[FIXED] Pandas drop row if column value has appeared more than some number of times depending on the value

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels