Issue
I have this example df:
df3 = pd.DataFrame({'Customer':['Sara','John','Didi','Sara','Didi' ,'Didi'],
'Date': ['15-12-2021', '1-1-2022' , '1-3-2022','15-3-2022', '1-1-2022' , '1-4-2022'],
'Month': ['December-2021', 'January-2022', 'March-2022','March-2022', 'January-2022', 'April-2022'],
'Product': ['grocery','electronics','personal-care','grocery','electronics','personal-care'],
'status': ['purchased', 'refunded', 'refunded','refunded', 'purchased', 'refunded']
})
df3
gives:
Customer Date Month Product status
0 Sara 15-12-2021 December-2021 grocery purchased
1 John 1-1-2022 January-2022 electronics refunded
2 Didi 1-3-2022 March-2022 personal-care refunded
3 Sara 15-3-2022 March-2022 grocery refunded
4 Didi 1-1-2022 January-2022 electronics purchased
5 Didi 1-4-2022 April-2022 personal-care refunded
I am trying to group by customer, product & month and get the first status and then I want the groupby sorted by Month column:
df3.sort_values('Month').groupby(['Customer','Product','Month','Date']).agg({'status':'first'}).reset_index()
I got:
Customer Product Month Date status
0 Didi electronics January-2022 1-1-2022 purchased
1 Didi personal-care April-2022 1-4-2022 refunded
2 Didi personal-care March-2022 1-3-2022 refunded
3 John electronics January-2022 1-1-2022 refunded
4 Sara grocery December-2021 15-12-2021 purchased
5 Sara grocery March-2022 15-3-2022 refunded
I expected for index 1 & 2
to be reversed in order, March before April so what I tried to do is:
months = {'December-2021':0,'January-2022':1,'February-2022':2,'March-2022':3,'April-2022':4,'May-2022':5,'June-2022':6,'July-2022':7,'August-2022':8,'September-2022':9,'October-2022':10,'November-2022':11}
then map this through sort values:
df3.sort_values(by=['Month'], key=lambda x: x.map(months)).groupby(['Customer','Product','Month','Date']).agg({'status':'first'}).reset_index()
But I got the same exact results without the correct order
Solution
The problem is that it is sorting a string and April
is before March
. You have to convert the string into a date first and then sort your entries. For example like this:
# Convert column Month to datetime
df3['Month'] = pd.to_datetime(df3['Month'], format='%B-%Y')
# Do your groupby
df_group = df3.sort_values('Month').groupby(['Customer','Product','Month','Date'], sort=False).first().reset_index()
# Convert column Month back to string
df_group['Month'] = df_group['Month'].dt.strftime('%B-%Y')
df_group
Output:
Customer Product Month Date status
0 Sara grocery December-2021 15-12-2021 purchased
1 Didi electronics January-2022 1-1-2022 purchased
2 John electronics January-2022 1-1-2022 refunded
3 Didi personal-care March-2022 1-3-2022 refunded
4 Sara grocery March-2022 15-3-2022 refunded
5 Didi personal-care April-2022 1-4-2022 refunded
Answered By - JANO
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.