Issue
I am facing problem while trying to understand the below code snippet of group by.I am trying to understand how is calculation is happening for df.groupby(L).sum().
This is a code snippet that i got from the urlenter link description here. Thanks for any help.
Solution
Rows are grouping by values of list, because length of list is same like number of rows in DataFrame
, it means:
rng = np.random.RandomState(0)
df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'],
'data1': range(6),
'data2': rng.randint(0, 10, 6)},
columns = ['key', 'data1', 'data2'])
L = [0, 1, 0, 1, 2, 0]
print (df)
key data1 data2
0 A 0 5 <-0
1 B 1 0 <-1
2 C 2 3 <-0
3 A 3 3 <-1
4 B 4 7 <-2
5 C 5 9 <-0
So:
data1 for 0 is 0 + 2 + 5 = 7
data2 for 0 is 5 + 3 + 9 = 17
data1 for 1 is 1 + 3 = 4
data2 for 1 is 0 + 3 = 3
data1 for 2 is 4
data2 for 2 is 7
Output:
print(df.groupby(L).sum())
data1 data2
0 7 17
1 4 3
2 4 7
Key column is omitted, because Automatic exclusion of 'nuisance' columns.
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.