Issue
I want to create a dual axis chart showing monthly user trips with the two secondary y axis showing average duration per month for each user type.
ds = {"duration":['1.02', '1.03', '1.08', '1.07', '2.02', '1,01'],
"start_time": ['2019-01-01 00:07:10.576', '2019-01-31 23:48:50.0920', '2019-01-01 00:11:03.441', '2019-01-31 20:58:33.8860', '2019-01-01 00:11:03.441', '2019-01-01 00:14:48.398'],
"user": [0, 1, 1, 0, 1, 0]
}
df = pd.DataFrame(ds)
Above is a sample of my dataframe with start_time in datetime format, duration in float, and user is a dummy variable where 0 represents "customer" and 1 represents "subscriber"
Solution
It makes more sense to have more than one month in your dataset so I've changed that
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
ds = {"duration": [1.02, 1.03, 1.08, 1.07, 2.02, 1.01],
"start_time": ['2019-01-01 00:07:10.576', '2019-01-31 23:48:50.0920', '2019-01-01 00:11:03.441',
'2019-02-27 20:58:33.8860', '2019-02-01 00:11:03.441', '2019-02-01 00:14:48.398'],
"user": [0, 1, 1, 0, 1, 0]
}
df = pd.DataFrame(ds)
get the month of each start_time and replace 1 and 0 with subscriber and customer (just for display)
df["month"] = pd.to_datetime(df.start_time).dt.month
df.user = np.where(df.user, "subscriber", "customer")
Groupby and calculate necessary data
data = df.groupby(["user", "month"]).agg(trips=("user", "count"), mean_duration=("duration", "mean"))
Build the plot
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
data.trips.plot(kind='bar', color='red', ax=ax1, width=0.4, position=1)
data.mean_duration.plot(kind='bar', color='blue', ax=ax2, width=0.4, position=0)
plt.setp(ax1.get_xticklabels(), ha="right", rotation=45) # rotate labels (looks nicer)
plt.tight_layout()
plt.show() # show plot (might be unnecessary for you)
Let me know if you have questions or if I misunderstood anything
Answered By - bitflip
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.