Issue
I currently have a pandas dataframe that looks like this:
location | count | qty | approved_count |
---|---|---|---|
Phoenix | 24 | 300 | 15 |
Dallas | 18 | 403 | 14 |
I would like to append a row to the dataframe that iterates over the columns and sums them, and then appends a new row to the bottom, including the value "Grand Total" in the 'location' column. The resulting dataset should look like this:
location | count | qty | approved_count |
---|---|---|---|
Phoenix | 24 | 300 | 15 |
Dallas | 18 | 403 | 14 |
Grand Total | 42 | 703 | 29 |
I am currently able to get this result this way:
df = df.append({'location' : 'Grand Total', 'count' :
df['count'].sum(), 'qty' : df['qty'].sum(),
'approved_count' : df['approved_count'].sum()}, ignore_index = True)
however I would like to be able to dynamically iterate over the columns and sum, excluding the 'location' column from the sum process. Is this possible with Pandas or Pyspark?
Solution
Try:
df = df.set_index("location")
df.loc["Grand Total"] = df.sum()
df = df.reset_index()
>>> df
location count qty approved_count
0 Phoenix 24 300 15
1 Dallas 18 403 14
2 Grand Total 42 703 29
Or in one line using concat
:
>>> pd.concat([df.set_index("location"), df.drop("location",axis=1).sum().rename("Grand Total").to_frame().T]).reset_index()
index count qty approved_count
0 Phoenix 24 300 15
1 Dallas 18 403 14
2 Grand Total 42 703 29
Answered By - not_speshal
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.