Issue
I have a dataframe with id, purchase date, price of purchase and duration in days,
df
id purchased_date price duration
1 2020-01-01 16.50 2
2 2020-01-01 24.00 4
What I'm trying to do is where ever the duration is greater than 1 day, I want the number of extra days to be split into duplicated rows, the price to be divided by the number of individual days and the date to increase by 1 day for each day purchased. Effectively giving me this,
df_new
id purchased_date price duration
1 2020-01-01 8.25 1
1 2020-01-02 8.25 1
2 2020-01-01 6.00 1
2 2020-01-02 6.00 1
2 2020-01-03 6.00 1
2 2020-01-04 6.00 1
So far I've managed to duplicate the rows based on the duration using.
df['price'] = df['price']/df['duration']
df = df.loc[df.index.repeat(df.duration)]
and then I've tried using,
df.groupby(['id', 'purchased_date']).purchased_date.apply(lambda n: n + pd.to_timedelta(1, unit='d'))
however, this just gets stuck in an endless loop and I'm a bit stuck.
My plan is to put this all in a function but for now I just want to get the process working.
Thank you for any help.
Solution
Use GroupBy.cumcount
for counter, so possible pass to to_timedelta
to_timedelta for days timedeltas and add to column purchased_date
:
df['price'] = df['price']/df['duration']
df = df.loc[df.index.repeat(df.duration)].assign(duration=1)
df['purchased_date'] += pd.to_timedelta(df.groupby(level=0).cumcount(), unit='d')
df = df.reset_index(drop=True)
print (df)
id purchased_date price duration
0 1 2020-01-01 8.25 1
1 1 2020-01-02 8.25 1
2 2 2020-01-01 6.00 1
3 2 2020-01-02 6.00 1
4 2 2020-01-03 6.00 1
5 2 2020-01-04 6.00 1
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.