Saturday, March 5, 2022

[FIXED] Returning all rows where all values in a list are present in the same column using pandas

March 05, 2022 pandas, python No comments

Issue

My data frame looks something like as below -

Here I want to find only those resturants where all food items are present - I first tried to group the rows by resturant_id but its not working

Code used for grouping -

df_new = df_new.groupby('resturant_id')           ##does not group,

result is same as before

Then I basically have a list of food_items, so I go in each row looking for the food items

eg data -

items_list = ['a','b']

resturant_id  price  food_item
1               1       'a'
1               1       'b'
2               1       'b'
3               1       'a'
3               2       'b'

So, basically my ask is to find only those resturants where all food items are found - in our case it will be resturant 1 and 3 - because they have 'a' and 'b' both

IsIn method in pandas looks for either 'a' or 'b' but but not both at the same time (meaning it does 'OR' not 'AND') - how to do this using pandas?

If I get the resturant having all items in list then comparison of resturants should happen to return the cheapest resturant for all products.

I tried isin as below but its not working as expected -

choice = ['a','b']
df_new = df[np.isin(df['item_label'], choice)].reset_index(drop=True)

    resturant_id    price   item_label
0   5                4.0    a
1   5                8.0    b
2   6                5.0    a
3   7                2.5    a
4   7                3.0    b

It returns even those rows where any food item is present - I want only those restrants where all food items are present - then want to find the cheapest resturant if there are more than 1 such resturant - such as 1 and 3 as explained in above example 5

Solution

We can use GroupBy.filter :

import numpy as np
new_df = \
df.groupby('resturant_id').filter(lambda x: np.isin(choice, x['food_item']).all())

   resturant_id  price food_item
0             1      1       'a'
1             1      1       'b'
3             3      1       'a'
4             3      2       'b'

Another option

new_df = \
df.loc[pd.get_dummies(df['food_item'])
         .groupby(df['resturant_id'])
         .transform('sum')
         .gt(0)
         .all(axis=1)]

Or if you want select items:

new_df = \
df.loc[pd.get_dummies(df['food_item'])
         .groupby(df['resturant_id'])[['a', 'b']]
         .transform('sum')
         .gt(0)
         .all(axis=1)]

Now we can get the cheaper for each product, note that GroupBy.rank is required as there may be a price tie

s = new_df.groupby('food_item')['price'].rank()
print(s)
0    1.5
1    1.0
3    1.5
4    2.0
Name: price, dtype: float64
        

cheaper_df = new_df.loc[s.eq(s.groupby(new_df['food_item']).transform('min'))]
print(cheaper_df)
   resturant_id  price food_item
0             1      1         a
1             1      1         b
3             3      1         a

cheaper_df.groupby('food_item')['resturant_id'].agg(list)
food_item
a    [1, 3]
b       [1]
Name: resturant_id, dtype: object

Answered By - ansev

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, March 5, 2022

[FIXED] Returning all rows where all values in a list are present in the same column using pandas

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels