Issue
I currently have the following dataframe: I want to group by probeset column and get a single value for each phchp subject by adding up the data with the same probeset values
Probeset phchp230v2 phchp273v3 phchp367v3
0 1554784_at 0.000000 0.000000 0.000000
1 1554784_at 0.000000 0.000000 0.000000
2 212983_at 0.244668 0.032524 0.113343
3 212983_at 0.022178 0.013750 0.011871
4 1566643_a_at 0.048200 0.089618 0.046528
What I'm looking for is this:
Probeset phchp230v2 phchp273v3 phchp367v3
0 1554784_at 0 0 0
1 1554784_at 0 0 0
2 212983_at 0.244668 0.046274 0.125214
3 212983_at 0.244668 0.046274 0.125214
4 1566643_a_at 0.048200 0.089618 0.046528
I've tried the following to no success, it does not group correctly:
for x in df_out:
if 'phchp' in x:
df_out[x] = df_out.groupby(['Probeset'])[x].sum()
Solution
You can groupby
+ transform
, then assign back to the DataFrame.
df1 = df.groupby('Probeset').transform('sum')
df[df1.columns] = df1
print(df)
Probeset phchp230v2 phchp273v3 phchp367v3
0 1554784_at 0.000000 0.000000 0.000000
1 1554784_at 0.000000 0.000000 0.000000
2 212983_at 0.266846 0.046274 0.125214
3 212983_at 0.266846 0.046274 0.125214
4 1566643_a_at 0.048200 0.089618 0.046528
Your loop also wasn't too far off, you just needed to use transform
. With transform
, the result of the groupby aggregation is broadcast to all rows belonging to that group, so it will align with the DataFrame index. Without transform
, the groupby result has an index based on the group keys, so simple assignment back to the DataFrame won't align, given you have a RangeIndex. The small change needed is:
for x in df:
if 'phchp' in x:
df[x] = df.groupby('Probeset')[x].transform('sum')
And for clarity, here is the difference in the groupby results with and without transform.
# Index is unique values of `'phchp367v3'`
df.groupby('Probeset')['phchp367v3'].sum()
#Probeset
#1554784_at 0.000000
#1566643_a_at 0.046528
#212983_at 0.250428
#Name: phchp367v3, dtype: float64
# Index is the same as the original DataFrame
df.groupby('Probeset')['phchp367v3'].transform('sum')
#0 0.000000
#1 0.000000
#2 0.250428
#3 0.250428
#4 0.046528
#Name: phchp367v3, dtype: float64
Answered By - ALollz
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.