Monday, January 17, 2022

[FIXED] Perform multiple calculations based on multiple columns, grouping by a value within a separate column

January 17, 2022 numpy, pandas, python No comments

Issue

I have a dataframe, df, where I would like to perform an addition based on values in a different column, As well as subtraction from a different column The addition calculation and the subtraction is based on the id value.

Data

id  date    value   start   status
aa  Q1.22   hello   80  
aa  Q2.22   hi     -200     2000
aa  Q3.22   hey    -400 
aa  Q4.22   sure   -500 
bb  Q3.23   ok      30  
bb  Q4.23   sure    10      100

Desired

id  date    consum  start   status  update update2
aa  Q1.22   20      80      
aa  Q2.22   25     -200     2000    1800
aa  Q3.22   100    -400                    1700
aa  Q4.22   200    -500                    1500
bb  Q3.23   10      30  
bb  Q4.23   600     10      100     110

Logic

1.The addition will begin at the first display of a single value. For example we see that the number 2000 is a single non consecutive value. The start point of the addition calculation will begin here. Similar to the 100 value (it is an ending value)

2.The second part of the problem is where we begin to subtract the value in the 'consum' column, consecutively for each group. For example:

we begin with 1800 for the id 'aa' , we now want to subtract the next consective value in the 'consum' column.

  from update column)   1800 - 100 (from cumsum column) = 1700

  from update2 column)  1700 - 200 (from cumsum column) = 1500

(The first update column will sum the start from status values. (This is now the new starting value) The second update, update2, then subtracts values in the consum column based on its id)

Doing

A SO member helped with a solution, but I have since updated the question and added another column within the problem

df['status'] = pd.to_numeric(df['status'], errors='coerce')
status = df.groupby('id')['status'].ffill()
df['update'] = (
        status + df['start'].mask(status.isna()).groupby(df['id']).cumsum()
)

I am still researching this, I think I need to redirect to the 'consum' column. Any suggestion is appreciated.

Solution

See working below.

s=df.groupby('id')['status'].transform(lambda x: (~x.eq("")).cumsum(), )#Create a grouper
df['update1']=df['status'].replace("", 'NaN').astype(float)+df['start']
df['update2']=df.groupby(['id',s],as_index=False).apply(lambda x: x['update1'].fillna(-x['consum']).cumsum().shift(-1)).reset_index(drop=True)
df['update2']=np.where(df['update2'].le(0),"", df['update2'])
print(df)







   id   date  consum  start status  update1 update2
0  aa  Q1.22      20     80             NaN     nan
1  aa  Q2.22      25   -200   2000   1800.0  1700.0
2  aa  Q3.22     100   -400             NaN  1500.0
3  aa  Q4.22     200   -500             NaN     nan
4  bb  Q3.23      10     30             NaN     nan
5  bb  Q4.23     600     10    100    110.0     nan

Answered By - wwnde

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, January 17, 2022

[FIXED] Perform multiple calculations based on multiple columns, grouping by a value within a separate column

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels