Issue
I have two dataframes (menu and orders) and I would like to label the items in the menu dataframe based on how popular it is (the number of times it appears in 'orders' dataframe).
import pandas as pd
menu = pd.DataFrame(
{'Item_Name':['Chicken Pizza','Mushroom Soup','Tiramisu'],
'Price':[8.99, 4.99, 5.99]})
orders = pd.DataFrame(
{'order':[1,2,3,4,5,6,7,8],
'Item Name':['Chicken Pizza','Mushroom Soup','Tiramisu','Chicken Pizza','Chicken Pizza','Mushroom Soup','Chicken Pizza','Tiramisu']})
menu
Item_Name Price
0 Chicken Pizza 8.99
1 Mushroom Soup 4.99
2 Tiramisu 5.99
orders
order Item Name
0 1 Chicken Pizza
1 2 Mushroom Soup
2 3 Tiramisu
3 4 Chicken Pizza
4 5 Chicken Pizza
5 6 Mushroom Soup
6 7 Chicken Pizza
7 8 Tiramisu
Desired Output: "1" if item ordered is more than the median of all items ordered and "0" if it is less or equal to the median number of times ordered.
Item_Name Price Popular
0 Chicken Pizza 8.99 1
1 Mushroom Soup 4.99 0
2 Tiramisu 5.99 0
I tried using this syntax but it does not work.
menu["Popular"] = lambda x: 1 if orders["Item Name"].count() > orders["Item Name"].median() else 0
Solution
We need to groupby count
to get the number of each item. Then we can compare this to the median
of the counts not the entire DataFrame, convert the True/False values to 1/0 (astype
), and join
back to menu
on the Item_Name
column:
counts = orders.groupby('Item Name')['Item Name'].count()
menu = menu.join(
(counts > counts.median()).astype(int).rename('Popular'),
on='Item_Name'
)
Or with Series.value_counts
instead of groupby count
:
counts = orders['Item Name'].value_counts()
menu = menu.join(
(counts > counts.median()).astype(int).rename('Popular'),
on='Item_Name'
)
Either way gets menu
:
Item_Name Price Popular
0 Chicken Pizza 8.99 1
1 Mushroom Soup 4.99 0
2 Tiramisu 5.99 0
Answered By - Henry Ecker
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.