Issue
I have a dataset with employee information. In this dataset the relevant fields are the EmployeeName column, ID column, and SupervisorName column. I want to create a new column called SupervisorID that gives us the ID of each employee's supervisor.
Input dataframe:
EmployeeName ID SupervisorName
Jim 123 Brittany
Brittany 345 Todd
Todd 456 Grace
expected output:
EmployeeName ID SupervisorName SupID
Jim 123 Brittany 345
Brittany 345 Todd 456
I am stunted on how to write this efficiently, I have tried using np.where() syntax and .iloc[] but I am new to python and would appreciate any help on how to solve this.
Solution
you can use merge, and merge the DataFrame
with itself
import pandas as pd
data =[{'EmployeeName': 'Jim', 'ID': 123,'SupervisorName': 'Brittany'},
{'EmployeeName': 'Brittany', 'ID': 345,'SupervisorName': 'Todd'},
{'EmployeeName': 'Todd', 'ID': 456,'SupervisorName': 'Grace'}]
df = pd.DataFrame(data)
df_sup = df[['EmployeeName','ID']]
df_sup= df_sup.rename(columns={'EmployeeName': 'SupervisorName', 'ID': 'SupID'})
df = df.merge(right=df_sup, on='SupervisorName', how='inner')
df
output:
EmployeeName ID SupervisorName SupID
0 Jim 123 Brittany 345
1 Brittany 345 Todd 456
Answered By - Dean Taler
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.