Issue
I have the following df
:
df = pd.DataFrame({'block': [1, 1, 1, 2, 2, 2, 3, 3, 3],
'B': ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c'],
'C': [1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000],
})
Which outputs:
block B C
0 1 a 1000
1 1 b 2000
2 1 c 3000
3 2 a 4000
4 2 b 5000
5 2 c 6000
6 3 a 7000
7 3 b 8000
8 3 c 9000
I want to output a diff_column
(could also be an entirely new_df
) who's values are the difference between the values in the C
column grouped by the B
column for every different value in the block
column.
That is, what's the difference between a
's value in block
2 and a
's value in block
1?
Example for a
:
4000 - 1000 = 3000
Note, the rows for block
== 1 would be empty since there is no other previous block
.
I have tried:
df['diff_column'] = df.groupby(['block', 'B'])['C'] - df.shift[-1].groupby(['block', 'B'])['C']
with no success.
Solution
Assuming the blocks are sorted, you can use groupby.diff
with column B as grouper:
df['diff'] = df.groupby('B')['C'].diff()
With groupby.shift
as you initially tried to do:
df['diff'] = df['C']-df.groupby('B')['C'].shift()
output:
block B C diff
0 1 a 1000 NaN
1 1 b 2000 NaN
2 1 c 3000 NaN
3 2 a 4000 3000.0 # block 2/a - block 1/a
4 2 b 5000 3000.0
5 2 c 6000 3000.0
6 3 a 7000 3000.0
7 3 b 8000 3000.0
8 3 c 9000 3000.0
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.