Issue
I have a dataframe, df, where I would like to perform an addition based on values in a different column, As well as subtraction from a different column The addition calculation and the subtraction is based on the id value.
Data
id date value start status
aa Q1.22 hello 80
aa Q2.22 hi -200 2000
aa Q3.22 hey -400
aa Q4.22 sure -500
bb Q3.23 ok 30
bb Q4.23 sure 10 100
Desired
id date consum start status update update2
aa Q1.22 20 80
aa Q2.22 25 -200 2000 1800
aa Q3.22 100 -400 1700
aa Q4.22 200 -500 1500
bb Q3.23 10 30
bb Q4.23 600 10 100 110
Logic
1.The addition will begin at the first display of a single value. For example we see that the number 2000 is a single non consecutive value. The start point of the addition calculation will begin here. Similar to the 100 value (it is an ending value)
2.The second part of the problem is where we begin to subtract the value in the 'consum' column, consecutively for each group. For example:
we begin with 1800 for the id 'aa' , we now want to subtract the next consective value in the 'consum' column.
from update column) 1800 - 100 (from cumsum column) = 1700
from update2 column) 1700 - 200 (from cumsum column) = 1500
(The first update column will sum the start from status values. (This is now the new starting value) The second update, update2, then subtracts values in the consum column based on its id)
Doing
A SO member helped with a solution, but I have since updated the question and added another column within the problem
df['status'] = pd.to_numeric(df['status'], errors='coerce')
status = df.groupby('id')['status'].ffill()
df['update'] = (
status + df['start'].mask(status.isna()).groupby(df['id']).cumsum()
)
I am still researching this, I think I need to redirect to the 'consum' column. Any suggestion is appreciated.
Solution
See working below.
s=df.groupby('id')['status'].transform(lambda x: (~x.eq("")).cumsum(), )#Create a grouper
df['update1']=df['status'].replace("", 'NaN').astype(float)+df['start']
df['update2']=df.groupby(['id',s],as_index=False).apply(lambda x: x['update1'].fillna(-x['consum']).cumsum().shift(-1)).reset_index(drop=True)
df['update2']=np.where(df['update2'].le(0),"", df['update2'])
print(df)
id date consum start status update1 update2
0 aa Q1.22 20 80 NaN nan
1 aa Q2.22 25 -200 2000 1800.0 1700.0
2 aa Q3.22 100 -400 NaN 1500.0
3 aa Q4.22 200 -500 NaN nan
4 bb Q3.23 10 30 NaN nan
5 bb Q4.23 600 10 100 110.0 nan
Answered By - wwnde
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.