Issue
I have a dataset that looks like this:
hiring_mgr_id candidate_id candidate_name emp_id emp_name
1000 1234 Joe 4321 Mike
1000 1234 Joe 9876 Sam
1000 1234 Joe 5674 Rob
what I want is to create a new row grouping by hiring_mgr_id
and if the candidate_id
is not in the emp_id
column then create a new row with populating emp_id
and emp_name
.
What I want:
hiring_mgr_id candidatae_id candidate_name emp_id emp_name
1000 1234 Joe 4321 Mike
1000 1234 Joe 9876 Sam
1000 1234 Joe 5674 Rob
1000 1234 Joe 1234 Joe
So far this is what I have:
new_row = []
for index, row in df.iterrows():
candidate_id= row['candidate_id']
emp_id = row['emp_id']
if candidate_id not in df['emp_id'].values:
new_row.append({'hiring_mgr_id:row['hiring_mg'r_id],
'candidate_name':row['candidate_name'],
'emp_id':row['emp_id'],
'emp_name': row['emp_name']
df = df.append(new_row,ignore_index = True)
When I do this I get an error 'DataFrame' object has no attribute 'append'
I thought you could use append with DataFrames any suggestion on how to fix? Thank you in advance.
Solution
A possible solution, which is based on the idea of creating a new dataframe with the 3 first columns and duplicating the candidate_id
and candidate_name
columns. Then, the two dataframes are concatenated and the duplicates dropped:
pd.concat([
df,
df.iloc[:, :3].assign(emp_id =
df['candidate_id'],
emp_name = df['candidate_name'])])
.drop_duplicates()
Answered By - PaulS
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.