Issue
My data frame looks something like as below -
Here I want to find only those resturants where all food items are present - I first tried to group the rows by resturant_id but its not working
Code used for grouping -
df_new = df_new.groupby('resturant_id') ##does not group,
result is same as before
Then I basically have a list of food_items, so I go in each row looking for the food items
eg data -
items_list = ['a','b']
resturant_id price food_item
1 1 'a'
1 1 'b'
2 1 'b'
3 1 'a'
3 2 'b'
So, basically my ask is to find only those resturants where all food items are found - in our case it will be resturant 1 and 3 - because they have 'a' and 'b' both
IsIn method in pandas looks for either 'a' or 'b' but but not both at the same time (meaning it does 'OR' not 'AND') - how to do this using pandas?
If I get the resturant having all items in list then comparison of resturants should happen to return the cheapest resturant for all products.
I tried isin as below but its not working as expected -
choice = ['a','b']
df_new = df[np.isin(df['item_label'], choice)].reset_index(drop=True)
resturant_id price item_label
0 5 4.0 a
1 5 8.0 b
2 6 5.0 a
3 7 2.5 a
4 7 3.0 b
It returns even those rows where any food item is present - I want only those restrants where all food items are present - then want to find the cheapest resturant if there are more than 1 such resturant - such as 1 and 3 as explained in above example 5
Solution
We can use GroupBy.filter
:
import numpy as np
new_df = \
df.groupby('resturant_id').filter(lambda x: np.isin(choice, x['food_item']).all())
resturant_id price food_item
0 1 1 'a'
1 1 1 'b'
3 3 1 'a'
4 3 2 'b'
Another option
new_df = \
df.loc[pd.get_dummies(df['food_item'])
.groupby(df['resturant_id'])
.transform('sum')
.gt(0)
.all(axis=1)]
Or if you want select items:
new_df = \
df.loc[pd.get_dummies(df['food_item'])
.groupby(df['resturant_id'])[['a', 'b']]
.transform('sum')
.gt(0)
.all(axis=1)]
Now we can get the cheaper for each product, note that GroupBy.rank
is required as there may be a price tie
s = new_df.groupby('food_item')['price'].rank()
print(s)
0 1.5
1 1.0
3 1.5
4 2.0
Name: price, dtype: float64
cheaper_df = new_df.loc[s.eq(s.groupby(new_df['food_item']).transform('min'))]
print(cheaper_df)
resturant_id price food_item
0 1 1 a
1 1 1 b
3 3 1 a
cheaper_df.groupby('food_item')['resturant_id'].agg(list)
food_item
a [1, 3]
b [1]
Name: resturant_id, dtype: object
Answered By - ansev
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.