Issue
Specifically I'm dealing with the Kaggle Titanic dataset. I've plotted a stacked histogram which shows ages that survived and died upon the titanic. Code below.
figure = plt.figure(figsize=(15,8))
plt.hist([data[data['Survived']==1]['Age'], data[data['Survived']==0]['Age']], stacked=True, bins=30, label=['Survived','Dead'])
plt.xlabel('Age')
plt.ylabel('Number of passengers')
plt.legend()
I would like to alter the chart to show a single chart per bin of the percentage in that age group that survived. E.g. if a bin contained the ages between 10-20 years of age and 60% of people aboard the titanic in that age group survived, then the height would line up 60% along the y-axis.
Edit: I may have given a poor explanation to what I'm looking for. Rather than alter the y-axis values, I'm looking to change the actual shape of the bars based on the percentage that survived.
The first bin on the graph shows roughly 65% survived in that age group. I would like this bin to line up against the y-axis at 65%. The following bins look to be 90%, 50%, 10% respectively, and so on.
The graph would end up actually looking something like this:
Solution
Perhaps the following will help ...
Split the dataframe based on 'Survived'
df_survived=df[df['Survived']==1] df_not_survive=df[df['Survived']==0]
Create Bins
age_bins=np.linspace(0,80,21)
Use np.histogram to generate histogram data
survived_hist=np.histogram(df_survived['Age'],bins=age_bins,range=(0,80)) not_survive_hist=np.histogram(df_not_survive['Age'],bins=age_bins,range=(0,80))
Calculate survival rate in each bin
surv_rates=survived_hist[0]/(survived_hist[0]+not_survive_hist[0])
Plot
plt.bar(age_bins[:-1],surv_rates,width=age_bins[1]-age_bins[0]) plt.xlabel('Age') plt.ylabel('Survival Rate')
Answered By - bahaugen
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.