Monday, March 14, 2022

[FIXED] Drop duplicates in pandas Dataframe

March 14, 2022 dataframe, pandas, pandas-groupby, python, python-3.x No comments

Issue

I have a DataFrame

  Type   Numer   master      width
  xyz    465_0     123        305
  xyz    465_0     123        305
  xyz    465_0     123        305
  xyz    465_0     123        315
  xyz    465_1     123        305
  xyz    465_1     123        305
  xyz    465_1     123        305
  xyz    465_1     123        315
  xyz    465_2     123        305
  xyz    465_2     123        305
  xyz    465_2     123        305
  xyz    465_2     123        315
  xyz    465_3     123        305
  xyz    465_3     123        305
  xyz    465_3     123        305
  xyz    465_3     123        315

From this I need the following DataFrame

  Type   Numer   master      width
  xyz    465_0     123        305
  xyz    465_1     123        305
  xyz    465_2     123        305
  xyz    465_3     123        315

My try is:

df[['Numer1', 'dig']] = df['Numer'].str.split("_", expand=True)
df = df.drop('Numer', axis = 1)
df.drop_duplicates()

But it is not giving me the result. I would like to write it in a generic way, because I have this for multiple types.

Data:

{'Type': ['xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 
          'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz'], 
 'Numer': ['465_0', '465_0', '465_0', '465_0', '465_1', '465_1', '465_1', '465_1', 
           '465_2', '465_2', '465_2', '465_2', '465_3', '465_3', '465_3', '465_3'], 
 'master': [123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123], 
 'width': [305, 305, 305, 315, 305, 305, 305, 315, 305, 305, 305, 315, 305, 305, 305, 315]}

Solution

We could use groupby + cumcount to create an group-specific ranking for each "Numer"; then filter the rows where the suffix in "Numer" matches the ranks in groups:

out = df[df['Numer'].str.split('_').str[1].astype(int) == df.groupby('Numer').cumcount()].drop(columns='rank')

Output:

   Type  Numer  master  width
0   xyz  465_0     123    305
5   xyz  465_1     123    305
10  xyz  465_2     123    305
15  xyz  465_3     123    315

Answered By - enke

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, March 14, 2022

[FIXED] Drop duplicates in pandas Dataframe

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels