Issue
I want to plot a Histogram that is Years vs Amount of female participants in the Olympics but I dont know how to give 2 variables and plot them according to each other I tried this
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import plotly
import plotly.express as px
mpl.rcParams['agg.path.chunksize'] = 10000
df = pd.read_csv("athlete_events.csv")
fig = plt.figure()
data = df[(df['Sex'] == 'M')].groupby('Year')['Sex'].count().reset_index()
data2 = df[(df['Sex'] == 'F')].groupby('Year')['Sex'].count().reset_index()
plt.hist(data['Year'], bins = 10)
plt.ylabel("Athlete per year",fontsize=14)
plt.xlabel("Year", fontsize=14)
plt.show()
and then I tried
plt.hist(data2['Year'],data2['Sex'], bins = 10)
But it didnt work
Solution
Since you've already computed counts, you should use bar()
instead of hist()
. A standard way to to plot grouped counts is with groupby()
-unstack()
:
df.groupby('Year')['Sex'].value_counts().unstack().plot.bar(ylabel='Athlete per year')
If you want to bin the year, cut()
the years and then groupby()
the bins:
df['Bin'] = pd.cut(df.Year, bins=10)
df.groupby('Bin')['Sex'].value_counts().unstack().plot.bar(xlabel='Year', ylabel='Athlete per year')
Answered By - tdy
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.