Issue
using np.where i am able to get 4 columns match1,match2,match3 and match4.Final column MATCHED has to be updated based on the values of match 1,match2,match3 and match4.if all 4 are yes then i have to update MATCHED as 'Yes' if any 3 out of 4 are yes then also 'Yes' is to be updated.Else No.
final_data = final_data.copy()
final_data['Match1'] = np.where(final_data['PROCESSOR_sub_column'] == final_data['PROCESSOR_Title'] , 'Yes', 'No')
final_data['Match2'] = np.where( final_data['RAM'] == final_data['RAM_Title'] , 'Yes', 'No')
final_data['Match3'] = np.where( final_data['Storage'] == final_data['STORAGE_Title'] , 'Yes', 'No')
final_data['Match4'] = np.where( final_data['Storage Type'] == final_data['STORAGE_TYPE_Title'] , 'Yes', 'No')
if (final_data['Match1']&final_data['Match2']&final_data['Match3']&final_data['Match4'] == 'Yes'):
final_data[Matched] = 'Yes'
attaching the data already generated as screenshot :data looks like this
i also tried to get the column Matched directly using np.where() , but i was not successful to check four conditions at the same time. also i don't know whst i should pass as X.y in np.where(,x,y) while using 4 conditions.
apologies if this question is repeat, i have tried my level best to read all previous posts about related topic.
Solution
You can compare all 4 columns for Yes
, then count True
s by sum per rows and compare if greater or equal by 3
by Series.ge
:
mask = df[['Match1','Match2','Match3','Match4']].eq('Yes').sum(axis=1).ge(3)
final_data['Matched'] = np.where(mask, 'Yes', 'No')
Another solution without helper columns id compare filtered DataFrames:
np.random.seed(2022)
c=['PROCESSOR_sub_column','RAM','Storage','Storage Type','PROCESSOR_Title',
'RAM_Title','STORAGE_Title','STORAGE_TYPE_Title']
final_data = pd.DataFrame(np.random.randint(2, size=(10,8)), columns=c)
df1 = final_data[['PROCESSOR_sub_column','RAM','Storage','Storage Type']]
df2 = final_data[['PROCESSOR_Title','RAM_Title','STORAGE_Title','STORAGE_TYPE_Title']]
mask = df1.eq(df2.to_numpy()).sum(axis=1).ge(3)
final_data['Matched'] = np.where(mask, 'Yes', 'No')
print (final_data)
PROCESSOR_sub_column RAM Storage Storage Type PROCESSOR_Title \
0 1 0 1 0 1
1 0 0 0 0 1
2 1 1 0 0 0
3 0 0 0 1 0
4 0 1 0 0 1
5 1 1 1 1 0
6 0 1 0 1 1
7 0 1 1 0 0
8 1 1 1 1 1
9 1 1 0 0 0
RAM_Title STORAGE_Title STORAGE_TYPE_Title Matched
0 1 0 1 No
1 1 1 1 No
2 1 0 1 No
3 0 0 1 Yes
4 1 1 1 No
5 0 0 0 No
6 0 0 0 No
7 1 0 1 No
8 1 0 1 Yes
9 0 1 0 No
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.