Issue
I have some scraped data from an ecommerce website and that has the package unit count in the name (see example below). I want to take the unit count information from the name and add the number of units as a int into a "Unit" column. I know I can use df.loc[product_column].str.contains('10 pk'), unitColumn] = 10
, or even loop through a list that holds sample strings of each unit count. But it becomes a little more cumbersome when your looking at data from 150 stores.
What I'm looking for is a way for python to figure it out automatically and change set the value for with ML. I don't know if it's possible but I'm hoping someone can point me in then right direction.
Product name 10 pack
flavor 10 Pack product name
10 pack name
product name flavor 10pk
flavor product name 14Pk
1gx14 Product name
name 2-Pack
2pk different name
store name 3 pack
product name 5 pack store name
name 5 pack
5 Pack product name
Name 5-pack product name
randome name 5pk product name
name 5pk flavor
6 Pack flavor
flavor 7 pack
prduct name 7 pack
7pk flavor
pack x2 name
I know I can do
df.loc[product_column].str.contains('10 pk'), unitColumn] = 10
But I'm looking for more of an automated solution.
Solution
I ended up using this solution:
Put all possibilities in a dictionary.
dfQty={
2:tuple(['Pack x2', '2-pack', '2pk','2-pack']),
3: tuple(['3 pack', '3pk']),
#4: tuple([]),
5: tuple(['5 pack', '5-pack','5pk']),
6: tuple(['6 pack']),
7: tuple(['7 pack', '7pk']),
#8: tuple([]),
#9: tuple([]),
10: tuple(['10pk','10 Pack']),
#11: tuple([]),
#12: tuple([]),
#13: tuple([]),
14: tuple(['14pk','1gx14'])
}
Then I looped through the dictionary and used str.containes(item) to change value.
for key, value in dfQty.items():
for item in value:
#for ind, row in df.iterrows()
df2.loc[df2['Product'].str.contains(item, case=False, na=False), 'Units'] = key
Import things I realized while doing this are:
- It's important for my dictionary items to be wrapped in a tuple; otherwise, I get an unhashable error.
- I had to comment out key with no values.
Answered By - Luke
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.