Issue
I created a simple dataset df
with three columns, Area
, Year_2010
, Year_2020
. The related code:
# Import pandas library
import pandas as pd
# initialize list of lists
data = [['Netherlands', 100, 200], ['Belgium', 15, 80], ['Germany', 125, 300]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Area', 'Year_2010', 'Year_2020'])
# print dataframe.
df
#prints
Area Year_2010 Year_2020
0 Netherlands 100 200
1 Belgium 15 80
2 Germany 125 300
For two of the three columns (being Year_2010
and Year_2020
), I would like to obtain the sum of all related rows. The output of the sum should appear in a new row for both columns.
Ideally, the output reads:
Area Year_2010 Year_2020
0 Netherlands 100 200
1 Belgium 15 80
2 Germany 125 300
3 240 580
However, if I apply
total = df.sum(axis=0)
df = df.append(total, ignore_index=True)
The Area
column will also be aggregated, leading to concatenation of country names in the Area
column:
Area Year_2010 Year_2020
0 Netherlands 100 200
1 Belgium 15 80
2 Germany 125 300
3 NetherlandsBelgiumGermany 240 580
How can I get the row sums for specific columns, while leaving out other columns?
As an alternative to leaving the value NetherlandsBelgiumGermany
blank, the value can be replaced by Total
.
Solution
One solution could be to filter the columns by type and use indexing:
cols = df.select_dtypes('number').columns
df.loc[len(df), cols] = df[cols].sum()
updated DataFrame:
Area Year_2010 Year_2020
0 Netherlands 100.0 200.0
1 Belgium 15.0 80.0
2 Germany 125.0 300.0
3 NaN 240.0 580.0
Alternative if you want the empty string:
cols = df.select_dtypes('number').columns
df.loc[len(df)] = df[cols].sum().reindex(df.columns, fill_value='')
output:
Area Year_2010 Year_2020
0 Netherlands 100 200
1 Belgium 15 80
2 Germany 125 300
3 240 580
NB. this assumes a range index, if this is not the case, please update the example for alternative ways (using pandas.concat
for example).
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.