Issue
I have a dictionary where I would like to map it to a large pandas dataframe. The issue is the column I would like to use to map is wrapped in double quotes and sometimes there are one or more items in that column.
original
dict_id = {
'College1': ['1256511'],
'College2': ['1200582'],
'College3': ['1256618'],
'College10': ['1256621']
}
id1 id2 college_name
0 01 01 "College1, College2"
1 01 02 "College10, College12"
2 01 03 "College19"
desired
id1 id2 college_name id_college
01 01 "College1, College2" 1256511, 1200582
01 02 "College10, College12" 1256621
01 03 "College19"
Solution
Your data is better formatted imo after explode
, but I put it all back to how it was at the end~
df.college_name = df.college_name.str[1:-1].str.split(', ')
df = df.explode('college_name')
df2 = pd.DataFrame.from_dict(dict_id, 'index', columns=['id_college'], dtype=str)
df = df.merge(df2, left_on='college_name', right_index=True, how='left')
df = df.fillna('').groupby(['id1', 'id2'], as_index=False).agg(', '.join)
df.college_name = '"' + df.college_name + '"'
print(df)
Output:
id1 id2 college_name id_college
0 01 01 "College1, College2" 1256511, 1200582
1 01 02 "College10, College12" 1256621,
2 01 03 "College19"
Answered By - BeRT2me
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.