Issue
I have a dataframe which simplified looks like this
id surface longitude latitude othercolumns
1 20 5.878 46.943 ...
1 15 5.88 46.7 ...
1 25 5.5 47 ...
2 30 9.6 40.2 ...
3 10 4.7 36 ...
3 25 3.8 41 ...
I read several posts advising groupby but in this case, to the best of my knowledge, it doesn't see to be adapted as it seems that only the variables grouped would be kept, and I need the other columns as well.
On advise I looked as well here, but the formula :
df.groupby('Player').agg({k: 'mean' if v in ('int64', 'float64') else 'first'
for k,v in df.dtypes[1:].items()})
Seems (if I understand it correctly) to apply the calculation to all numeric fields, which would not make sense to sum geographic coordinates.
As a consequence I thought maybe by aiming at having:
(A)
id surface longitude latitude othercolumns
1 60 5.878 46.943 ...
1 60 5.88 46.7 ...
1 60 5.5 47 ...
2 30 9.6 40.2 ...
3 35 4.7 36 ...
3 35 3.8 41 ...
(ex: 60 for the sum of 20 + 15 + 25 in the case of id 1)
Then applying df.drop_duplicates() so I can retain something like (B)
id surface longitude latitude othercolumns
1 60 5.878 46.943 ...
2 30 9.6 40.2 ...
3 35 4.7 36 ...
However I don't know how to manage to get to (A), I hope I'm explaining this issue with clarity as I don't have the impression that I saw this brought out before.
Solution
You get to your (A) datframe example by using 'transform' which transforms the grouped by operation back to the original dataframe.
Something like
df['surface']=df.groupby('id')['surface'].transform('sum')
Answered By - Daniel Weigel
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.