Issue
I am working on an employee attrition problem. I have columns for department and if the employee quit or not.
Since some departments have more employees working in them, I would assume that they account for the majority of attrition.
So I would like to graph the attrition rate per department. For example, out of the 1000 people that work in IT 200 quit (20% attrition rate).
What I have been doing is creating pivot tables and cross table to get the relevant information then calculating manually attrition rates. But I feel like this can be done more efficiently.
What I have currently is:
- Sample DataFrame
df= pd.DataFrame({'department': ['sales','sales','HR','sales','sales','HR','sales','R&D','HR','sales'],
'quit': ['yes','no','no','yes','no','yes','yes','no','yes','no'],
'employee_count': [1,1,1,1,1,1,1,1,1,1]})
- Pivot Table to find out how many employees are in each department
pivot = pd.pivot_table(df, values='employee_count',
columns=['department'], aggfunc=np.sum)
which gives output of
. | HR | R&D | Sales |
---|---|---|---|
employee_count | 3 | 1 | 6 |
- Crosstab to find the number of employees that quit per department
pivot2=pd.crosstab(values=df['employee_count'],
index=df['department'],
columns=df['quit'],
aggfunc=np.sum)
Quit | no | yes |
---|---|---|
HR | 1 | 2 |
R&D | 1 | nan |
Sales | 3 | 3 |
- Calculate Manually and Plot
import matplotlib.pyplot as plt
plt.rcParams['axes.facecolor'] = 'white'
fig = plt.figure()
fig.patch.set_facecolor('white')
names = ['HR','sales']
values = [66.66666667,50]
plt.figure()
plt.bar(names, values)
any help would be appreciated, thanks for your time
Solution
Pandas offers to group by certain columns. In your case we can aggregate the results based on the department and whether someone quit. Then use a pivot to change the format of the table and finally calculate the total number of employees and quit rate:
# obtain the relevant results by grouping
agg_df = df.groupby(by=['department', 'quit']).agg({'employee_count': 'sum'})
agg_df.reset_index(drop=False, inplace=True)
# pivot the table for further usage
pivot_df = agg_df.pivot(columns=['quit'], index=['department'], values='employee_count')
pivot_df.fillna(0, inplace=True)
# calculate final statistics
pivot_df['total'] = pivot_df['yes'] + pivot_df['no']
pivot_df['quit_rate'] = pivot_df['yes'] / pivot_df['total']
Answered By - user7375116
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.