Issue
I've a dataset which has columns named "dish_liked" it has values like based on the corresponding columns which is named "rest_type" and it has input like "Cafe" "Quick Bites" "Delivery" etc
Output of "dish_liked" columns when corresponding column "rest_type" values = "Quick Bites"
Waffles 43
Nutella Pancakes 17
Donut, Coffee 14
Apple Pie, Mascarpone Cheese, Nolen Gurer Ice Cream, Paan Ice Cream, Nolen Gur, Gur Ice Cream, Salted Caramel 13
Coffee, Berryblast, Nachos, Chocolate Waffles, Nutella Waffle, Chocolate Overload, Sandwiches 12
Now I've Nan values in the columns "dish_liked" and I want to fill them on the basis of corresponding column "rest_type".
My logic is I'll fetch the top 5 values (string) from "dish_liked" on the basis of "rest_type" and fill randomly.
eg : new_df.loc[new_df['rest_type'].isin(['Dessert Parlor']) , 'dish_liked'].value_counts()[0:5]
Waffles 43
Nutella Pancakes 17
Donut, Coffee 14
Apple Pie, Mascarpone Cheese, Nolen Gurer Ice Cream, Paan Ice Cream, Nolen Gur, Gur Ice Cream, Salted Caramel 13
Coffee, Berryblast, Nachos, Chocolate Waffles, Nutella Waffle, Chocolate Overload, Sandwiches 12
Now if the dish liked column has Nan value and it's corresponding column "rest_type" values = "Dessert Parlor" or "Cafe" etc. I want to fill the these upper top 5 values (string) on these Nan Values.
How can I do that ? Sorry if it sound confusing. Thanks in advance
Solution
What you still did not understand from my point is:
1 - your top 5 are tuples, not a flat list
2 - If you want to use a tuple the approach has to be different, which I don't be believe to be the case. Below you have two versions to get your job done, the first one is pretty manual and the second one will make it very dynamic and easy for you to get the job as you expect. There are marks all the way in the code to explain the logic as you said that yo are a newbie.
You are welcome dude.
import random
import pandas as pd
#Adding top 5 manually aiming to simplify the logic
top5 = ['Friendly Staff','Burgers', 'Coffee', 'Waffles', 'Mocktails', 'Pasta', 'Brownie' 'Chocolate', 'Chicken Salami',
'Burgers', 'Pasta', 'Chocolate Mousse', 'Potato Wedges', 'Cup Cake', 'Cheesy Fries', 'Peri Peri Chicken',
'Burgers', 'Coffee', 'Cappuccino', 'Barbeque Burger', 'Sandwiches', 'Spinach Pasta', 'Sandwich',
'Bannoffee Pie', 'Pasta', 'Sandwiches', 'Salsa', 'Sandwich', 'Salads', 'Pita Bread']
#eliminating duplicates with set
top5 = list(set(top5))
top5C = ['Nuddles','Rice','Mapo Tofu','Chow Mein','Chinese Hot Pot']
top5D = ['Fast','Slow','Great','Not happy','Extremely happy']
#dummy df
dfTest = pd.DataFrame({'Rest_type':['Cafe','Others','Cafe','Chinese','Delivery','Cafe'],
'dish_liked':['Brownie','Waffles', np.nan, np.nan, np.nan, np.nan]})
#replacing nan values with the list created above, function randomchoice will pick a random values within a list
#basically everytime you call the script it will add different values
dfTest = dfTest.fillna(dfTest.loc[dfTest['Rest_type'] == 'Cafe'].fillna(random.choice(top5)))
dfTest = dfTest.fillna(dfTest.loc[dfTest['Rest_type'] == 'Chinese'].fillna(random.choice(top5C)))
dfTest = dfTest.fillna(dfTest.loc[dfTest['Rest_type'] == 'Delivery'].fillna(random.choice(top5D)))
#now how you make it dynamic and easy the hassle
#fucntion to collect all top 5 by category
def build_di(df, k):
di = {k: df[df.Rest_type == k]['dish_liked'].value_counts()[:5]}
return di
#create a list of unique values from the Rest_type column
i = dfTest.Rest_type.unique()
#Use the first value to build the dictionary with the function
di = build_di(dfTest, i[0])
#Add all the other categories with a for loop
for x in i[1:]:
di.update(build_di(dfTest, x))
#with the dictionary below I tested the last function and it worked like a charm.
#di = {'Cafe':top5,'Chinese':top5C,'Delivery':top5D}
#here the for loop will iterate each category('cafe','chinese','delivery') in your case
#filter the df then filter which list in the dictonary it will select a random value
for k in di.keys():
dfTest = dfTest.fillna(dfTest.loc[dfTest['Rest_type'] == k].fillna(random.choice(di[k])))
dfTest
Answered By - ReinholdN
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.