Issue
I was hoping to get some help with a calculation I'm struggling a bit with. I'm working with some data (copied below) and I need create a calculation that takes the first value > 0 from another column and computes a new series based on that value, and then aggregates the numbers giving a cumulative sum. My raw data looks like this:
d = {'Final Account': ['A', 'A', 'A' ,'A' , 'A', 'A', 'A','A' ,'A' ,'A', 'A', 'A', 'A'],
'Date': ['Jun-21','Jul-21','Aug-21','Sep-21','Oct-21','Nov-21','Dec-21','Jan-22','Feb-22','Mar-22','Apr-22','May-22','Jun-22'],
'Units':[0, 0, 0, 0, 10, 0, 20, 0, 0, 7, 12, 35, 0]}
df = pd.DataFrame(data=d)
Account Date Units
A Jun-21 0
A Jul-21 0
A Aug-21 0
A Sep-21 0
A Oct-21 10
A Nov-21 0
A Dec-21 20
A Jan-22 0
A Feb-22 0
A Mar-22 7
A Apr-22 12
A May-22 35
A Jun-22 0
To the table I do an initial conversion for my data which is:
df['Conv'] = df['Units'].apply(x/5)
This adds a new column to my table like this:
Account Date Units Conv
A Jun-21 0 0
A Jul-21 0 0
A Aug-21 0 0
A Sep-21 0 0
A Oct-21 10 2
A Nov-21 0 0
A Dec-21 20 4
A Jan-22 0 0
A Feb-22 0 0
A Mar-22 7 1
A Apr-22 12 2
A May-22 35 7
A Jun-22 0 0
The steps after this I begin to run into issues. I need to calculate new field which takes the first value of the conv field > 0, at the same index position and begins a new calculation based on the previous rows cumsum and then adds it back into the cumsum following the calculation. Outside of python this is done by creating two columns. One to calculate new units by:
(Units - (previous row cumsum of existing units * 2))/5
Then existing units which is just the cumsum of the values that have been figured out to be new units. The desired output should look something like this:
Account Date Units Conv New Units Existing Units (cumsum of new units)
A Jun-21 0 0 0 0
A Jul-21 0 0 0 0
A Aug-21 0 0 0 0
A Sep-21 0 0 0 0
A Oct-21 10 2 2 2
A Nov-21 0 0 0 2
A Dec-21 20 4 3 5
A Jan-22 0 0 0 5
A Feb-22 0 0 0 5
A Mar-22 7 1 0 5
A Apr-22 12 2 0 5
A May-22 35 7 5 10
A Jun-22 0 0 0 10
The main issue I'm struggling with is grabbing the first value >0 from the "Conv" column and being able to create a new cumsum based on that initial value that can be applied to the "New Units" calculation. Any guidance is much appreciated, and despite reading a lot around I've hit a bit of a brick wall! If you need me to explain better please do ask! :)
Much appreciated in advance!
Solution
I'm not sure that I completely understand what you are trying to achieve. Nevertheless, here's an attempt to reproduce your expected results. For your example frame this
groups = (df['Units'].eq(0) & df['Units'].shift().ne(0)).cumsum()
df['New Units'] = 0
last = 0
for _, group in df['Units'].groupby(groups):
i, unit = group.index[-1], group.iloc[-1]
if unit != 0:
new_unit = (unit - last * 2) // 5
last = df.at[i, 'New Units'] = new_unit
does result in
Final Account Date Units New Units
0 A Jun-21 0 0
1 A Jul-21 0 0
2 A Aug-21 0 0
3 A Sep-21 0 0
4 A Oct-21 10 2
5 A Nov-21 0 0
6 A Dec-21 20 3
7 A Jan-22 0 0
8 A Feb-22 0 0
9 A Mar-22 7 0
10 A Apr-22 12 0
11 A May-22 35 5
12 A Jun-22 0 0
The first step identifies the blocks in column Units
whose last item is relevant for building the new units: Successive zeros, followed by non-zeros, until the first zero. This
groups = (df['Units'].eq(0) & df['Units'].shift().ne(0)).cumsum()
results in
0 1
1 1
2 1
3 1
4 1
5 2
6 2
7 3
8 3
9 3
10 3
11 3
12 4
Then group column Units
along these blocks, grab the last item if each block if it is non-zero (zero can only happen in the last block), build the new unit (according to the given formula) and store it in the new column New Units
.
(If you actually need the column Existing Units
then just use .cumsum()
on the column New Units
.)
If there are multiple accounts (indicated in the comments), then one way to apply the procedure to each account separately would be to pack it into a function (here new_units
), .groupby()
over the Final Account
column, and .apply()
the function to the groups:
def new_units(sdf):
groups = (sdf['Units'].eq(0) & sdf['Units'].shift().ne(0)).cumsum()
last = 0
for _, group in sdf['Units'].groupby(groups):
i, unit = group.index[-1], group.iloc[-1]
if unit != 0:
new_unit = (unit - last * 2) // 5
last = sdf.at[i, 'New Units'] = new_unit
return sdf
df['New Units'] = 0
df = df.groupby('Final Account').apply(new_units)
Answered By - Timus
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.