Issue
Issue:
I have two tables with sample inputs below:
Baseline_Cars:
Name | Fname | FW_Base | Seq | FP_Base | FW_Prop | FP_Prop | isChanged | changeType |
---|---|---|---|---|---|---|---|---|
Audi | A1 | 2 | 0 | 2 | 0 | 0 | 0 | "" |
Audi | A2 | 3 | 0 | 3 | 0 | 0 | 0 | "" |
Audi | A3 | 4 | 0 | 4 | 0 | 0 | 0 | "" |
BMW | X1 | 5 | 0 | 5 | 0 | 0 | 0 | "" |
BMW | X2 | 6 | 0 | 6 | 0 | 0 | 0 | "" |
Merc | M4 | 7 | 0 | 7 | 0 | 0 | 0 | "" |
Merc | M5 | 8 | 0 | 8 | 0 | 0 | 0 | "" |
Proposed_Cars:
1 | 2 | 3 | 4(FW_Base) | 5(FW_Prop) | 6(FP_Base) | 7(FP_Prop) | 8(isChanged) | 9(changeType) |
---|---|---|---|---|---|---|---|---|
144 | Audi | A1 | 2 | 1 | 1 | 1 | 1 | W |
144 | Audi | A2 | 3 | 3 | 3 | 1 | 1 | P |
144 | BMW | X1 | 5 | 3 | 3 | 3 | 1 | W |
144 | BMW | X2 | 6 | 4 | 4 | 4 | 1 | W |
Expected Solution:
1 | 2 | 3 | 4(FW_Base) | 5(FW_Prop) | 6(FP_Base) | 7(FP_Prop) | 8(isChanged) | 9(changeType) |
---|---|---|---|---|---|---|---|---|
144 | Audi | A1 | 2 | 1 | 1 | 1 | 1 | W |
144 | Audi | A2 | 3 | 3 | 3 | 1 | 1 | P |
144 | Audi | A3 | 4 | 0 | 4 | 0 | 0 | NULL |
144 | BMW | X1 | 5 | 3 | 3 | 3 | 1 | W |
144 | BMW | X2 | 6 | 4 | 4 | 4 | 1 | W |
Explanation of the expected solution:
Baseline cars contain names of car brands with Fnames, baseline weights(FW_Base),baseline Parameters(FP_Base).
Proposed cars contain names of cars with Proposed new weights(FW_PROP) and new Parameters(FP_PROP).
By default the FW_PROP, FP_PROP, isChanged are 0 in Baseline cars and changeType column is empty string.
isChanged=1 in Proposed cars tells that new weights or parameters have been suggested(if changeType = W, new weight change or changeType=P, new Parameter change)
I wish to merge the tables on the condition that if the Fname in Proposed cars exists in Baseline cars, we substitute the respective columns in Baseline_Cars, along with imputing NULL for changeType incase no change were made. (e.g Audi A3 row)
Snippet code to generate the above sample input data:
baseline_cars = pd.DataFrame(columns=['Name','Fname','FW_Base','Seq','FP_Base','FW_Prop','FP_Prop','isChanged','ChangeType'])
proposed_cars = pd.DataFrame(columns=['1','2','3','4(FW_Base)','5(FW_Prop)','6(FP_Base)','7(FP_Prop)','8(isChanged)','9(ChangeType)'])
baseline_cars = baseline_cars.append({'Name':'Audi','Fname':'A1','FW_Base':2,'Seq':0,'FP_Base':2,'FW_Prop':0,'FP_Prop':0,'isChanged':0,'ChangeType':""}, ignore_index=True)
baseline_cars = baseline_cars.append({'Name':'Audi','Fname':'A2','FW_Base':3,'Seq':0,'FP_Base':3,'FW_Prop':0,'FP_Prop':0,'isChanged':0,'ChangeType':""}, ignore_index=True)
baseline_cars = baseline_cars.append({'Name':'Audi','Fname':'A3','FW_Base':4,'Seq':0,'FP_Base':4,'FW_Prop':0,'FP_Prop':0,'isChanged':0,'ChangeType':""}, ignore_index=True)
baseline_cars = baseline_cars.append({'Name':'BMW','Fname':'X1','FW_Base':5,'Seq':0,'FP_Base':5,'FW_Prop':0,'FP_Prop':0,'isChanged':0,'ChangeType':""}, ignore_index=True)
baseline_cars = baseline_cars.append({'Name':'BMW','Fname':'X2','FW_Base':6,'Seq':0,'FP_Base':6,'FW_Prop':0,'FP_Prop':0,'isChanged':0,'ChangeType':""}, ignore_index=True)
baseline_cars = baseline_cars.append({'Name':'Merc','Fname':'M4','FW_Base':7,'Seq':0,'FP_Base':7,'FW_Prop':0,'FP_Prop':0,'isChanged':0,'ChangeType':""}, ignore_index=True)
baseline_cars = baseline_cars.append({'Name':'Merc','Fname':'M5','FW_Base':8,'Seq':0,'FP_Base':8,'FW_Prop':0,'FP_Prop':0,'isChanged':0,'ChangeType':""}, ignore_index=True)
proposed_cars = proposed_cars.append({'1':144,'2':'Audi','3':'A1','4(FW_Base)':2,'5(FW_Prop)':1,'6(FP_Base)':1,'7(FP_Prop)':1,'8(isChanged)':1,'9(ChangeType)':"W"}, ignore_index=True)
proposed_cars = proposed_cars.append({'1':144,'2':'Audi','3':'A2','4(FW_Base)':3,'5(FW_Prop)':3,'6(FP_Base)':3,'7(FP_Prop)':1,'8(isChanged)':1,'9(ChangeType)':"P"}, ignore_index=True)
proposed_cars = proposed_cars.append({'1':144,'2':'BMW','3':'X1','4(FW_Base)':5,'5(FW_Prop)':3,'6(FP_Base)':3,'7(FP_Prop)':3,'8(isChanged)':1,'9(ChangeType)':"W"}, ignore_index=True)
proposed_cars = proposed_cars.append({'1':144,'2':'BMW','3':'X2','4(FW_Base)':6,'5(FW_Prop)':4,'6(FP_Base)':4,'7(FP_Prop)':4,'8(isChanged)':1,'9(ChangeType)':"W"}, ignore_index=True)
Solution
Check Below code, sharing raw data also that is being used for clarity (created it before it was shared in Question)
import pandas as pd
import numpy as np
Baseline_Cars = pd.DataFrame({'Name':['Audi','Audi','Audi','BMW','BMW','Merc','Merc',],
'Fname':['A1','A2','A3','X1','X2','M4','M5',],
'FW_Base':['2','3','4','5','6','7','8',],
'Seq':['0','0','0','0','0','0','0',],
'FP_Base':['2','3','4','5','6','7','8',],
'FW_Prop':['0','0','0','0','0','0','0',],
'FP_Prop':['0','0','0','0','0','0','0',],
'isChanged':['0','0','0','0','0','0','0',],
'changeType':["","","","","","","",]
})
Proposed_Cars = pd.DataFrame({'1':['144','144','144','144',],
'2':['Audi','Audi','BMW','BMW',],
'3':['A1','A2','X1','X2',],
'4(FW_Base)':['2','3','5','6',],
'5(FW_Prop)':['1','3','3','4',],'6(FP_Base)':['1','3','3','4',],
'7(FP_Prop)':['1','1','3','4',],'8(isChanged)':['1','1','1','1',],
'9(changeType)':['W','P','W','W',]})
merged_df = pd.merge(Baseline_Cars,Proposed_Cars, left_on=['Name','Fname'], right_on = ['2','3'], how='left' )
merged_df = merged_df[merged_df['Name'].isin(Proposed_Cars['2'].tolist())]
merged_df = merged_df.drop(['2','3'], axis=1).rename(columns={'Name':'2','Fname':'3'})
common_cols = [(j,i) for j in Proposed_Cars for i in Baseline_Cars if j.find(i)> -1]
merged_df[[i[0] for i in common_cols]] = merged_df.apply(lambda x: ','.join([x[i[1]] if (str(x[i[0]]) == 'nan') else x[i[0]] for i in common_cols]), axis=1).astype('str').str.split(',', expand=True)
merged_df[['1','2','3']+[i[0] for i in common_cols]].ffill().replace('',np.NaN)
Output:
Update - as per OP' comments below, it works with data shared in question
merged_df = pd.merge(baseline_cars,proposed_cars, left_on=['Name','Fname'], right_on = ['2','3'], how='left' )
merged_df = merged_df[merged_df['Name'].isin(proposed_cars['2'].tolist())]
merged_df = merged_df.drop(['2','3'], axis=1).rename(columns={'Name':'2','Fname':'3'})
common_cols = [(j,i) for j in proposed_cars for i in baseline_cars if j.find(i)> -1]
merged_df[[i[0] for i in common_cols]] = merged_df.apply(lambda x: ','.join([ str(x[i[1]]) if (str(x[i[0]]) == 'nan') else str(x[i[0]]) for i in common_cols]), axis=1).astype('str').str.split(',', expand=True)
merged_df[[i[0] for i in common_cols][:-1]] = merged_df[[i[0] for i in common_cols][:-1]].astype('int')
merged_df[['1','2','3']+[i[0] for i in common_cols]].ffill().replace('',np.NaN)
Answered By - Abhishek
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.