Issue
Have partly solved the data-task using for loops and lists in python but reviewer was not satisfied. Could you help with taking advantage of pandas methods. Can not find much in the documentation and forums. Basically, if for each store period_id[q]-period_id[q-1]!=1 then the number of periods increases by 1. My take on the task:
for i in range(len(period_id)-1):
if (period_id[i+1] - period_id[i])!=1:
promo_count=0
total_period_count+=1
Number_of_promo_periods.append(total_period_count)
Duration.append(promo_count)
else:
promo_count+=1
Number_of_promo_periods.append('')
Duration.append(promo_count)
data['Number_of_promo_periods']=Number_of_promo_periods
data['Duration']=Duration
store_id period_id sales_volume Number_of_promo_periods Duration period_total_sale
724224 4168621 191 2.25 1 0
216520 4168621 192 6.35 1
175105 4168621 193 7.05 2
699084 4168621 194 2.50 3
780957 4168621 195 1.70 4
403579 4168621 196 4.75 5
385893 4168621 197 4.90 6
415417 4168621 198 4.70 7
106273 4168621 199 10.00 8
493167 4168621 200 4.10 9
683860 4168621 201 2.60 10
73777 4168621 202 14.80 11
599425 4168621 203 3.30 12
137835 4168621 204 8.15 13
296985 4168621 205 5.50 14
548164 4168621 206 3.70 15 86.35
296984 4168621 208 5.50 2 0
453535 4168621 209 4.40 1 9.9
336839 4168621 211 5.25 3 0
630290 4168621 212 3.05 1
385922 4168621 213 4.90 2
679706 4168621 214 2.65 3 15.85
Given
store_id period_id sales_volume
724224 4168621 191 2.25
216520 4168621 192 6.35
175105 4168621 193 7.05
699084 4168621 194 2.50
780957 4168621 195 1.70
403579 4168621 196 4.75
385893 4168621 197 4.90
415417 4168621 198 4.70
106273 4168621 199 10.00
493167 4168621 200 4.10
683860 4168621 201 2.60
73777 4168621 202 14.80
599425 4168621 203 3.30
137835 4168621 204 8.15
296985 4168621 205 5.50
548164 4168621 206 3.70
296984 4168621 208 5.50
453535 4168621 209 4.40
336839 4168621 211 5.25
630290 4168621 212 3.05
385922 4168621 213 4.90
679706 4168621 214 2.65
Solution
With pandas you can find the periods with diff
and use groupby
to aggregate by period
df['nr_promo'] = df.period_id.diff().ne(1).cumsum()
gr = df.groupby('nr_promo')
df['duration'] = gr.cumcount()
df['sale_total'] = gr['sales_volume'].transform('sum')
df
Out:
store_id period_id sales_volume nr_promo duration sale_total
724224 4168621 191 2.25 1 0 86.35
216520 4168621 192 6.35 1 1 86.35
175105 4168621 193 7.05 1 2 86.35
699084 4168621 194 2.50 1 3 86.35
780957 4168621 195 1.70 1 4 86.35
403579 4168621 196 4.75 1 5 86.35
385893 4168621 197 4.90 1 6 86.35
415417 4168621 198 4.70 1 7 86.35
106273 4168621 199 10.00 1 8 86.35
493167 4168621 200 4.10 1 9 86.35
683860 4168621 201 2.60 1 10 86.35
73777 4168621 202 14.80 1 11 86.35
599425 4168621 203 3.30 1 12 86.35
137835 4168621 204 8.15 1 13 86.35
296985 4168621 205 5.50 1 14 86.35
548164 4168621 206 3.70 1 15 86.35
296984 4168621 208 5.50 2 0 9.90
453535 4168621 209 4.40 2 1 9.90
336839 4168621 211 5.25 3 0 15.85
630290 4168621 212 3.05 3 1 15.85
385922 4168621 213 4.90 3 2 15.85
679706 4168621 214 2.65 3 3 15.85
Answered By - Michael Szczesny
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.