Issue
Could you take a look at my code, please?
This is (part of) my DF
display(count_df)
title
0 Programmer
1 Oracle Fusion/ EBS Developer
2 Software Engineer RH-22-07
3 Web Developer E-Commerce
4 Software Engineer
5 Junior Front-End Developer/Designer
6 Programmer Analyst (Giampaolo Group - Hybrid W...
7 Full Stack Developer
8 SAS/SQL Programmer
9 AWS Cloud Architect
10 Full Stack Software Engineer
11 Full Stack .Net Developer (Independent Contrac...
This is the result I am getting
#Create a new column called 'counts', and then group by the times the title appears within the column
count_df['counts'] = count_df.groupby('title')['title'].transform('count')
print(count_df.value_counts())
title counts
AWS Cloud Architect 4 4
Programmer 4 4
Web Developer E-Commerce 4 4
Software Engineer 4 4
SAS/SQL Programmer 4 4
Full Stack .Net Developer (Independent Contractor) 4 4
Web developer 4 4
Junior Front-End Developer/Designer 4 4
Intermediate Software Developer 4 4
Full Stack Software Engineer 4 4
Full Stack Developer 4 4
Junior Java Developer (Recent Graduate) 3 3
Software Developer 3 3
Software Developer, Co-Op (Sept-Dec) 3 3
Software Engineer with Test 3 3
Oracle Fusion/ EBS Developer 1 1
React Developer (3 Month Contract) 1 1
Software Engineer RH-22-07 1 1
Programmer Analyst (Giampaolo Group - Hybrid Work From Home) 1 1
Finally, the result I need is this, but as a new DF.
title counts
AWS Cloud Architect 4
Programmer 4
Web Developer E-Commerce 4
Software Engineer 4
SAS/SQL Programmer 4
Full Stack .Net Developer (Independent Contractor) 4
Web developer 4
Junior Front-End Developer/Designer 4
Intermediate Software Developer 4
Full Stack Software Engineer 4
Full Stack Developer 4
Junior Java Developer (Recent Graduate) 3
Software Developer 3
Software Developer, Co-Op (Sept-Dec) 3
Software Engineer with Test 3
Oracle Fusion/ EBS Developer 1
React Developer (3 Month Contract) 1
Software Engineer RH-22-07 1
Programmer Analyst (Giampaolo Group - Hybrid Work From Home) 1
I think that what is duplicating the 'counts' column is this part of the code "print(count_df.value_counts())", but without it, I don't get the expected result.
Thank you.
Solution
I think what duplicates the columns is that you are assigning the results to a new column of the DataFrame instead of a new DataFrame.
try this:
count_df2 = count_df.groupby('title')['title'].size().sort_values(ascending=False).reset_index(name='counts')
or this:
count_df2 = count_df.groupby('title')['title'].agg('count').sort_values(ascending=False).reset_index(name='counts')
Answered By - diml
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.