Issue
hello i have new dataset by grouped. this is the result;
job y
admin. 0 5227
1 1045
blue-collar 0 5208
1 517
entrepreneur 0 755
1 96
housemaid 0 586
1 82
management 0 1507
1 255
retired 0 761
1 331
self-employed 0 759
1 111
services 0 2165
1 260
student 0 364
1 216
technician 0 3434
1 589
unemployed 0 479
1 109
unknown 0 166
1 26
at this case, i want to plot into bar plot sort by sum of each job to get information top the most job, here the code i used for but it has a mistakes
import matplotlib.pyplot as plt
plt.figure(figsize=(6,6))
pekerjaan = df_new.groupby(['job','y'])['y'].size().unstack()
pekerjaan.sort_values(by='y',ascending=True).plot(kind='barh',stacked=True)
plt.title('Job')
plt.ylabel('Kind of job')
plt.xlabel('Total')
plt.show()
thank you in advance
Solution
Sample Data and Imports:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
np.random.seed(25)
n = 100
df_new = pd.DataFrame({
'job': np.random.choice(['admin', 'blue-collar', 'entrepreneur'],
p=[.4, .4, .2], size=n),
'y': np.random.choice([0, 1], size=n)
})
Then sum
across each row to get the row total, then sort
by the row totals:
plt.figure(figsize=(6, 6))
plot_df = df_new.groupby(['job', 'y'])['y'].size().unstack()
plot_df['All'] = plot_df.sum(axis=1)
plot_df = plot_df.sort_values('All')
ax = plot_df.plot(kind='barh', y=[0, 1], stacked=True,
title='Job', xlabel='Kind of Job',
rot=0)
plt.tight_layout()
plt.show()
Summary Counts:
plot_df = df_new.groupby(['job', 'y'])['y'].size().unstack()
y 0 1
job
admin 19 17
blue-collar 24 25
entrepreneur 10 5
plot_df
with the All
column:
plot_df['All'] = plot_df.sum(axis=1)
y 0 1 All
job
admin 19 17 36
blue-collar 24 25 49
entrepreneur 10 5 15
After sort_values
:
plot_df = plot_df.sort_values('All')
y 0 1 All
job
entrepreneur 10 5 15
admin 19 17 36
blue-collar 24 25 49
An alternative with crosstab
+ margins
:
plt.figure(figsize=(6, 6))
plot_df = (
pd.crosstab(df_new['job'], df_new['y'], margins=True)
.iloc[:-1]
.sort_values('All')
)
ax = plot_df.plot(kind='barh', y=[0, 1], stacked=True,
title='Job', xlabel='Kind of Job',
rot=0)
plt.tight_layout()
plt.show()
Both Produce:
Answered By - Henry Ecker
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.