Issue
I have two dataframes, one is populated with barcodes and their product's prices across several supermarkets during the COVID pandemic, and the other one contains the actual product names and their barcodes along with a few category_id's etc, which are known (so I know which category_id to target in order to get a specific set of barcodes.)
the df_markets:
BARCODE AC BFRESH LIDL SUPERM
Date
2020-01-03 5201263086618 6.36 7.97 0 0 8.31
2020-01-03 5201263086625 7.58 9.53 0 0 9.91
2020-01-03 7322540574852 18.11 18.34 0 0 8.86
2020-01-03 7322540647136 18.8 18.95 0 0 18.9
2020-01-03 7322540587555 18.22 18.98 0 0 9.21
and the df_bar:
product_id category_id subcategory_name BARCODE name
0 1.0 37.0 Adult diapers 5.201263e+12 Sani Pants N2 Medium 14
1 2.0 37.0 Adult diapers 5.201263e+12 Sani Pants N3 Large 14
2 3.0 37.0 Adult diapers 7.322541e+12 Tena Pants Plus Large
3 4.0 37.0 Adult diapers 7.322541e+12 Tena Slip Super Large N4
4 5.0 37.0 Adult diapers 7.322541e+12 Tena Pants Plus Extra Large
What I've tried doing is first locate inside the df_bar all of the barcodes that have a category_id of 114.0 (which is of detergent) like that:
df_bar.loc[df_bar['category_id']==114.0]
and that produced a two-rows dataframe, so I know that this category_id exists, and then tried locating inside the df_markets dataframe all of the barcodes that much this search query:
df_markets.loc[df_markets['BARCODE'].isin(df_bar.loc[df_bar['category_id']==114.0])]
like that, but that produces an empty dataframe like that:
BARCODE AC BFRESH LIDL SUPERM
Date
so I am guessing the problem is that the df_markets has the barcodes in this format "5201263086618" while the df_bar has the barcodes in this format "5.201263e+12". Do you know what needs to be done to be able to search within the df_markets dataframe effectively? Or let me know if there's something wrong with my code.
A minor edit that helped me after Danila's suggestion was to turn my "query" into this :
df_bar['BARCODE'] = df_bar['BARCODE'].apply(lambda x: str(x)[:-2])
#above line changed the format
df_markets.loc[df_markets['BARCODE'].isin(df_bar.loc[df_bar['category_id']==114.0]['BARCODE'])]
so that it can take into consideration only the BARCODE column.
Solution
just convert BARCODE column to string. example:
df_bar['BARCODE'] = df_bar['BARCODE'].apply(lambda x: str(x)[:-2])
Answered By - Danila Ganchar
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.