Issue
I have a dataframe with some recurring values in one column. I want to group by that column and sum the other columns. The dataframe looks like this:
Edit: here is the code to create the dataframe. Notice the column called 'Able' which is the index.
df=pd.DataFrame({'Able': ['Blue', 'Green', 'Red', 'Orange'], 'Baker':[ 'New York', 'New Jersey', 'New York', 'New Jersey'], 'Charlie':[3,4,'',7], 'Delta':['',5,6,''],'Echo':[100,200,300,400]}).set_index('Able')
The result should group on 'Baker' and sum the other three columns. I've tried various flavors of groupby and pivot_table. They return the correct two rows (New York and New Jersey) but they only return 'Baker' and the sum for the rightmost column, 'Echo.' The far left column 'Able' which is the index for the source dataframe should be ignored. My output should look like this (edited thanks to @corralien for spotting an error):
Baker Charlie Delta Echo
New Jersey 11 5 600
New York 3 6 400
How do I return all the columns, ideally without listing them by name in the code?
Solution
Replace the space with 0 and agg sum. This will depend on what dype, the last three columns are. I repoduced df for you, feel free to edit if I got the dtypes wrong and edit the question. The forum will guide you.
Dataframe
df=pd.DataFrame({'Baker':[ 'New York', 'New Jersey', 'New York', 'New Jersey'], 'Charlie':[3,4,'',7], 'Delta':['',5,6,''],'Echo':[100,200,300,400]})
Code
df.replace('',0).groupby('Baker').agg('sum')
Output
Charlie Delta Echo
Baker
New Jersey 11 5 600
New York 3 6 400
Answered By - wwnde
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.