Issue
I have two table that has football players data and I will merge them from the name feature however one table has feature 'long_name' and other has 'short_name'. For example one player has name "Kevin Oghenetega Tamaraebi Bakumo-Abraham" in one table and "Tammy Abraham" in the other table as you can see they some times use kind of nickname.
I tried to use fuzzywuzzy library based on ratio of letters but it is not working as expected. any suggestion?
def find_best_match(name, choices):
return process.extractOne(name, choices, scorer=fuzz.ratio)
# Apply the function to find the best match for each name in forwards
forwards['best_match'] = forwards['Player'].apply(lambda x: find_best_match(x, fifa_fows['long_name']))
# Extract the best matched names and merge again
forwards['best_name'] = forwards['best_match'].apply(lambda x: x[0] if x else None)
final_forwards = pd.merge(forwards, fifa_fows[['long_name', 'overall']], left_on='best_name', right_on='long_name', how='left')
# Drop intermediate columns
final_forwards = final_forwards.drop(['best_match', 'best_name'], axis=1)
Solution
I suggest using str.contains
function to find partial matches:
str.contains("needle")
Join dataframes based on partial string-match between columns
Answered By - Mohan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.