Monday, August 29, 2022

[FIXED] From a dataframe, how to groupby a column and append the result to create a new dataframe?

August 29, 2022 dataframe, jupyter-notebook, pandas, python No comments

Issue

Could you take a look at my code, please?
This is (part of) my DF

display(count_df)

    title
0   Programmer
1   Oracle Fusion/ EBS Developer
2   Software Engineer RH-22-07
3   Web Developer E-Commerce
4   Software Engineer
5   Junior Front-End Developer/Designer
6   Programmer Analyst (Giampaolo Group - Hybrid W...
7   Full Stack Developer
8   SAS/SQL Programmer
9   AWS Cloud Architect
10  Full Stack Software Engineer
11  Full Stack .Net Developer (Independent Contrac...

This is the result I am getting

#Create a new column called 'counts', and then group by the times the title appears within the column
count_df['counts'] = count_df.groupby('title')['title'].transform('count')
print(count_df.value_counts())

title                                                         counts
AWS Cloud Architect                                           4         4
Programmer                                                    4         4
Web Developer E-Commerce                                      4         4
Software Engineer                                             4         4
SAS/SQL Programmer                                            4         4
Full Stack .Net Developer (Independent Contractor)            4         4
Web developer                                                 4         4
Junior Front-End Developer/Designer                           4         4
Intermediate Software Developer                               4         4
Full Stack Software Engineer                                  4         4
Full Stack Developer                                          4         4
Junior Java Developer (Recent Graduate)                       3         3
Software Developer                                            3         3
Software Developer, Co-Op (Sept-Dec)                          3         3
Software Engineer with Test                                   3         3
Oracle Fusion/ EBS Developer                                  1         1
React Developer (3 Month Contract)                            1         1
Software Engineer RH-22-07                                    1         1
Programmer Analyst (Giampaolo Group - Hybrid Work From Home)  1         1

Finally, the result I need is this, but as a new DF.

title                                                         counts
AWS Cloud Architect                                           4         
Programmer                                                    4         
Web Developer E-Commerce                                      4         
Software Engineer                                             4         
SAS/SQL Programmer                                            4         
Full Stack .Net Developer (Independent Contractor)            4         
Web developer                                                 4         
Junior Front-End Developer/Designer                           4
Intermediate Software Developer                               4
Full Stack Software Engineer                                  4
Full Stack Developer                                          4
Junior Java Developer (Recent Graduate)                       3         
Software Developer                                            3         
Software Developer, Co-Op (Sept-Dec)                          3         
Software Engineer with Test                                   3         
Oracle Fusion/ EBS Developer                                  1         
React Developer (3 Month Contract)                            1         
Software Engineer RH-22-07                                    1         
Programmer Analyst (Giampaolo Group - Hybrid Work From Home)  1

I think that what is duplicating the 'counts' column is this part of the code "print(count_df.value_counts())", but without it, I don't get the expected result.

Thank you.

Solution

I think what duplicates the columns is that you are assigning the results to a new column of the DataFrame instead of a new DataFrame.

try this:

count_df2 = count_df.groupby('title')['title'].size().sort_values(ascending=False).reset_index(name='counts')

or this:

count_df2 = count_df.groupby('title')['title'].agg('count').sort_values(ascending=False).reset_index(name='counts')

Answered By - diml

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, August 29, 2022

[FIXED] From a dataframe, how to groupby a column and append the result to create a new dataframe?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels