Issue
I have 3 dataframes. I merge df1
and df2
through a common column. However, I need to use df3
to find what values are allowed for pairs seen in groupby created. I could get this part done too using 2-column merge through inner join, but I also need to se the entries that did not have any common elements. So far what I could do is represented with a model problem here:
ch = {'country':['India','India','India','USA','USA','Italy','Italy'],'hotel':['Taj','Oberoi','Hilton','Taj','Hilton','Oberoi','Marriott']}
ch_df = pd.DataFrame.from_dict(ch)
hm = {'hotel':['Taj','Taj','Taj','Oberoi','Oberoi','Marriott','Marriott','Marriott','Hilton','Hilton'],'menu':['ildi','dosa','soup','soup','ildi','soup','pasta','pizza','pizza','burger']}
hm_df = pd.DataFrame.from_dict(hm)
cm = {'country':['India','India','India','USA','USA','USA','Italy','Italy'],'menu':['ildi','dosa','soup','dosa','burger','pizza','pizza','pasta']}
cm_df = pd.DataFrame.from_dict(cm)
chm_df = pd.merge(ch_df, hm_df, left_on='hotel', right_on='hotel')
pd.merge(left=chm_df, right=cm_df, on=['country','menu'], how='inner').groupby(['country','hotel'])['menu'].apply(list).reset_index(name='menu items')
country hotel menu items
0 India Oberoi [ildi, soup]
1 India Taj [ildi, dosa, soup]
2 Italy Marriott [pasta, pizza]
3 USA Hilton [pizza, burger]
4 USA Taj [dosa]
What I need are entries such as:
5 Italy Oberoi []
...
One inefficient way is to add to each pair in hm_df
an allowed menu item and remove it after groupby. But it looks ugly. What is a more elegant method?
Solution
If need all possible combinations is possible use DataFrame.unstack
withDataFrame.stack
, for replace non exist values to empty lists use fill_value=[]
parameter:
df = pd.merge(chm_df, cm_df, on=['country','menu']).groupby(['country','hotel'])['menu'].apply(list).unstack(fill_value=[]).stack().reset_index(name='menu items')
print (df)
country hotel menu items
0 India Hilton []
1 India Marriott []
2 India Oberoi [ildi, soup]
3 India Taj [ildi, dosa, soup]
4 Italy Hilton []
5 Italy Marriott [pasta, pizza]
6 Italy Oberoi []
7 Italy Taj []
8 USA Hilton [pizza, burger]
9 USA Marriott []
10 USA Oberoi []
11 USA Taj [dosa]
For completness if need only non exist values from chm_df
convert to empty lists:
df = pd.merge(chm_df, cm_df, on=['country','menu']).groupby(['country','hotel'])['menu'].apply(list).reindex(pd.MultiIndex.from_frame(ch_df), fill_value=[]).reset_index(name='menu items')
print (df)
country hotel menu items
0 India Taj [ildi, dosa, soup]
1 India Oberoi [ildi, soup]
2 India Hilton []
3 USA Taj [dosa]
4 USA Hilton [pizza, burger]
5 Italy Oberoi []
6 Italy Marriott [pasta, pizza]
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.