Monday, March 28, 2022

[FIXED] Generating column mode in data panel

March 28, 2022 mode, numpy, pandas, statistics No comments

Issue

I have an unbalanced data panel where each period a student can have a certain level/type of scholarship:

head(df)

ID     student_period         scholarship
   
4567        1              scholarship_level_1
4567        2              scholarship_level_2
4567        3              scholarship_level_2
4567        4              scholarship_level_3
5478        4              scholarship_level_3
5478        5              scholarship_level_3
6758        7              scholarship_level_1
6758        8              scholarship_level_2
6758        9              scholarship_level_2

Basically, I want to create a new variable that plots the statistical mode of the scholarship level for each student ID in this panel. Something like this:

head(df1)

ID     student_period         scholarship            scholarship_mode
   
4567        1              scholarship_level_1      scholarship_level_2
4567        2              scholarship_level_2      scholarship_level_2 
4567        3              scholarship_level_2      scholarship_level_2
4567        4              scholarship_level_3      scholarship_level_2
5478        4              scholarship_level_3      scholarship_level_3
5478        5              scholarship_level_3      scholarship_level_3
6758        7              scholarship_level_1      scholarship_level_2
6758        8              scholarship_level_2      scholarship_level_2
6758        9              scholarship_level_2      scholarship_level_2

Any ideas?

Solution

You can use groupby+transform and value_counts:

df['scholarship_mode'] = (df.groupby('ID')['scholarship']
                          .transform(lambda x: x.value_counts().index[0]))

or mode:

df['scholarship_mode'] = (df.groupby('ID')['scholarship']
                          .transform(lambda x: x.mode().iloc[0]))

output:

     ID  student_period          scholarship     scholarship_mode
0  4567               1  scholarship_level_1  scholarship_level_2
1  4567               2  scholarship_level_2  scholarship_level_2
2  4567               3  scholarship_level_2  scholarship_level_2
3  4567               4  scholarship_level_3  scholarship_level_2
4  5478               4  scholarship_level_3  scholarship_level_3
5  5478               5  scholarship_level_3  scholarship_level_3
6  6758               7  scholarship_level_1  scholarship_level_2
7  6758               8  scholarship_level_2  scholarship_level_2
8  6758               9  scholarship_level_2  scholarship_level_2

NB. be aware that mode/value_counts can have ties, in this case only one value will be used.

Answered By - mozway

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, March 28, 2022

[FIXED] Generating column mode in data panel

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels