Issue
I have a dataframe df
containing net exports between any two countries in column From
and To
respectively.
df.to_dict()
returns
{'From': {0: 'A', 1: 'A', 2: 'B', 3: 'C', 4: 'D'},
'To': {0: 'B', 1: 'C', 2: 'C', 3: 'D', 4: 'A'},
'Net exports': {0: 3, 1: 6, 2: 2, 3: 2, 4: 5}}
It looks as follows: [1]: https://i.stack.imgur.com/y6BRe.png
I want to get a second dataframe df2
which shows net trade per country.
It means, if a country is in From
column in df
, its value needs to be added.
If the country is in To
column in df
, its value needs to be subtracted.
It should look something as shown: [2]: https://i.stack.imgur.com/I1xrE.png
For example,
in df
, A to B is 3, A to C is 6 and D to A is 5. Hence, the value of A in df2
is 3+6-5 = 4.
Note, sometimes some countries may not appear in both From and To columns in df
.
To get list of all countries in both columns, I could use
all_countries = list(set(df["From"]).union(set(df["To"])))
all_countries
However, what would be the process to proceed to the next step to get df2
from df
?
Solution
Compute a groupby.sum
on both From and To, subtract
and reset_index
:
out = (df
.groupby('From')['Net exports'].sum()
.sub(df.groupby('To')['Net exports'].sum())
.rename_axis('Countries').reset_index()
)
Output:
Countries Net exports
0 A 4
1 B -1
2 C -6
3 D 3
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.