Issue
I've the following sample dataset:
I need to find out the most active time of the day when sales are happening (by the hour or hour segments, any would do fine) from the column of 'Transaction Date'. I also need to find out the most active day (as in Monday, Tuesday, etc.), from the same column.
Now, my initial approach was to first rejig the Order_Details dataframe I'd created for the dataset to be more accomodating as:
Order_Details['Time'] = pd.to_datetime(Order_Details['Transaction Date'])
Order_Details['Hour'] = (Order_Details['Time']).dt.hour
Order_Details['Hour'].unique()
And then I decided to obtain a Series as:
timemost = Order_Details['Hour'].value_counts()
But it is here that I hit an obstacle. Whilst this timemost series does return the most frequent hour of the day by the count, it does so in a sorted order, and I need to have two separate lists -- one for the frequency of occurence and the other for the hour itself, where frequency is sorted by hour and not the other way around. And finally, I intend to plot it as line/bar chart, whatever may be applicable.
What should I do to improve this and get the desired result?
Solution
First, it is more encouraging to help if the data is easier to be reproduced.
Order_Details dataframe:
Name Email Product Transaction Date
PERSON_1 [email protected] PRODUCT_75 2021-03-01 00:47:00
PERSON_2 [email protected] PRODUCT_75 2021-03-01 02:04:00
PERSON_3 [email protected] PRODUCT_63 2021-03-01 09:10:00
PERSON_4 [email protected] PRODUCT_63 2021-03-01 09:49:00
PERSON_5 [email protected] PRODUCT_34,PRODUCT_86,PRODUCT_57,PRODUCT_89 2021-03-01 10:56:00
PERSON_6 [email protected] PRODUCT_34,PRODUCT_66,PRODUCT_58,PRODUCT_83 2021-03-01 11:06:00
PERSON_7 [email protected] PRODUCT_75 2021-03-01 12:02:00
PERSON_8 [email protected] PRODUCT_63,PRODUCT_90,PRODUCT_27,PRODUCT_5S 2021-03-01 12:31:00
PERSON_9 [email protected] PRODUCT_26 2021-03-01 12:51:00
PERSON_10 [email protected] PRODUCT_63 2021-03-01 12:52:00
PERSON_11 [email protected] PRODUCT_63 2021-03-01 13:15:00
PERSON_12 [email protected] PRODUCT_5,PRODUCT_34 2021-03-01 13:37:00
PERSON_13 [email protected] PRODUCT_84,PRODUCT_27 2021-03-01 14:17:00
PERSON_14 [email protected] PRODUCT_27 2021-03-01 14:27:00
PERSON_15 [email protected] PRODUCT_25 2021-03-01 14:37:00
PERSON_16 [email protected] PRODUCT_75 2021-03-01 14:56:00
PERSON_17 [email protected] PRODUCT_84 2021-03-01 15:03:00
PERSON_18 [email protected] PRODUCT_34 2021-03-01 15:08:00
PERSON 19 PERSON [email protected] PRODUCT 85 2021-03-01 15:48:00
You can create the 'Hour' column straightforward:
Order_Details['Time'] = pd.to_datetime(Order_Details['Transaction Date']).dt.hour
I don't think that calling the 'unique' method will change anything
sorting by index:
timemost = Order_Details['Time'].value_counts().sort_index()
0 1
2 1
9 2
10 1
11 1
12 4
13 2
14 4
15 3
the line plot:
line = timemost.plot.line()
and finally, to show you have to import the 'maplotlib.pyplot' module:
import matplotlib.pyplot as plt
plt.show()
Answered By - Ze'ev Ben-Tsvi
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.