Monday, November 14, 2022

[FIXED] Loop and fill values randomly (string) in pandas

November 14, 2022 dataframe, pandas, python No comments

Issue

I've a dataset which has columns named "dish_liked" it has values like based on the corresponding columns which is named "rest_type" and it has input like "Cafe" "Quick Bites" "Delivery" etc

Output of "dish_liked" columns when corresponding column "rest_type" values = "Quick Bites"

Waffles                                                                                                                  43
Nutella Pancakes                                                                                                         17
Donut, Coffee                                                                                                            14
Apple Pie, Mascarpone Cheese, Nolen Gurer Ice Cream, Paan Ice Cream, Nolen Gur, Gur Ice Cream, Salted Caramel            13
Coffee, Berryblast, Nachos, Chocolate Waffles, Nutella Waffle, Chocolate Overload, Sandwiches                            12

Now I've Nan values in the columns "dish_liked" and I want to fill them on the basis of corresponding column "rest_type".

My logic is I'll fetch the top 5 values (string) from "dish_liked" on the basis of "rest_type" and fill randomly.

eg : new_df.loc[new_df['rest_type'].isin(['Dessert Parlor']) , 'dish_liked'].value_counts()[0:5]

Waffles                                                                                                                  43
Nutella Pancakes                                                                                                         17
Donut, Coffee                                                                                                            14
Apple Pie, Mascarpone Cheese, Nolen Gurer Ice Cream, Paan Ice Cream, Nolen Gur, Gur Ice Cream, Salted Caramel            13
Coffee, Berryblast, Nachos, Chocolate Waffles, Nutella Waffle, Chocolate Overload, Sandwiches                            12

Now if the dish liked column has Nan value and it's corresponding column "rest_type" values = "Dessert Parlor" or "Cafe" etc. I want to fill the these upper top 5 values (string) on these Nan Values.

How can I do that ? Sorry if it sound confusing. Thanks in advance

Solution

What you still did not understand from my point is:
1 - your top 5 are tuples, not a flat list
2 - If you want to use a tuple the approach has to be different, which I don't be believe to be the case. Below you have two versions to get your job done, the first one is pretty manual and the second one will make it very dynamic and easy for you to get the job as you expect. There are marks all the way in the code to explain the logic as you said that yo are a newbie.
You are welcome dude.

import random
import pandas as pd

#Adding top 5 manually aiming to simplify the logic
top5 = ['Friendly Staff','Burgers', 'Coffee', 'Waffles', 'Mocktails', 'Pasta', 'Brownie' 'Chocolate', 'Chicken Salami',
'Burgers', 'Pasta', 'Chocolate Mousse', 'Potato Wedges', 'Cup Cake', 'Cheesy Fries', 'Peri Peri Chicken',
'Burgers', 'Coffee', 'Cappuccino', 'Barbeque Burger', 'Sandwiches', 'Spinach Pasta', 'Sandwich',
'Bannoffee Pie', 'Pasta', 'Sandwiches', 'Salsa', 'Sandwich', 'Salads', 'Pita Bread']
#eliminating duplicates with set
top5 = list(set(top5))
top5C = ['Nuddles','Rice','Mapo Tofu','Chow Mein','Chinese Hot Pot']
top5D = ['Fast','Slow','Great','Not happy','Extremely happy']

#dummy df
dfTest = pd.DataFrame({'Rest_type':['Cafe','Others','Cafe','Chinese','Delivery','Cafe'],
                       'dish_liked':['Brownie','Waffles', np.nan, np.nan, np.nan, np.nan]})

#replacing nan values with the list created above, function randomchoice will pick a random values within a list
#basically everytime you call the script it will add different values
dfTest = dfTest.fillna(dfTest.loc[dfTest['Rest_type'] == 'Cafe'].fillna(random.choice(top5)))
dfTest = dfTest.fillna(dfTest.loc[dfTest['Rest_type'] == 'Chinese'].fillna(random.choice(top5C)))
dfTest = dfTest.fillna(dfTest.loc[dfTest['Rest_type'] == 'Delivery'].fillna(random.choice(top5D)))


#now how you make it dynamic and easy the hassle
#fucntion to collect all top 5 by category
def build_di(df, k):
    di = {k: df[df.Rest_type == k]['dish_liked'].value_counts()[:5]}
    return di

#create a list of unique values from the Rest_type column
i = dfTest.Rest_type.unique()

#Use the first value to build the dictionary with the function
di = build_di(dfTest, i[0])

#Add all the other categories with a for loop
for x in i[1:]:
    di.update(build_di(dfTest, x))


#with the dictionary below I tested the last function and it worked like a charm.   
#di = {'Cafe':top5,'Chinese':top5C,'Delivery':top5D}

#here the for loop will iterate each category('cafe','chinese','delivery') in your case
#filter the df then filter which list in the dictonary it will select a random value
for k in di.keys():
    dfTest =  dfTest.fillna(dfTest.loc[dfTest['Rest_type'] == k].fillna(random.choice(di[k])))
dfTest

Answered By - ReinholdN

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, November 14, 2022

[FIXED] Loop and fill values randomly (string) in pandas

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels