Monday, August 1, 2022

[FIXED] How do I add data to a column only if a certain value exists in previous column using Python and Faker?

August 01, 2022 faker, function, pandas, python No comments

Issue

I'm pretty new to Python and not sure what to even google for this. What I am trying to do is create a Pandas DataFrame that is filled with fake data by using Faker. The problem I am having is each column is generating fake data in a silo. I want to be able to have fake data created based on something that exists in a prior column.

So in my example below, I have pc_type ["PC", "Apple] From there I have the operating system and the options are Windows 10, Windows 11, and MacOS. Now I want only where pc_type = "Apple" to have the columns fill with the value of MacOS. Then for everything that is type PC, it's 50% Windows 10 and 50% Windows 11.

How would I write this code so that in the function body I can make that distinction clear and the results will reflect that?

from faker import Faker
from faker.providers import BaseProvider, DynamicProvider
import numpy as np
import pandas as pd
from datetime import datetime
import random

pc_type = ['PC', 'Apple']
fake = Faker()


def create_data(x):
    project_data = {}
    for i in range(0, x):
        project_data[i] = {}
        project_data[i]['Name'] = fake.name()
        project_data[i]['PC Type'] = fake.random_element(pc_type)
        project_data[i]['With Windows 10'] = fake.boolean(chance_of_getting_true=25)
        project_data[i]['With Windows 11 '] = fake.boolean(chance_of_getting_true=25)
        project_data[i]['With MacOS'] = fake.boolean(chance_of_getting_true=50)

    return project_data


df = pd.DataFrame(create_data(10)).transpose()
df

Solution

I'd slightly change the approach and generate a column OS. This column you can then transform into With MacOS etc. if needed.

With this approach its easier to get the 0.5 / 0.5 split within Windows right:

from faker import Faker
from faker.providers import BaseProvider, DynamicProvider
import numpy as np
import pandas as pd
from datetime import datetime
import random
from collections import OrderedDict

pc_type = ['PC', 'Apple']
wos_type = OrderedDict([('With Windows 10', 0.5), ('With Windows 11', 0.5)])
fake = Faker()

def create_data(x):
    project_data = {}
    for i in range(x):
        project_data[i] = {}
        project_data[i]['Name'] = fake.name()
        project_data[i]['PC Type'] = fake.random_element(pc_type)
        if project_data[i]['PC Type'] == 'PC':
            project_data[i]['OS'] = fake.random_element(elements = wos_type)
        else:
            project_data[i]['OS'] = 'MacOS'

    return project_data


df = pd.DataFrame(create_data(10)).transpose()
df

Output

                     Name PC Type               OS
0         Nicholas Walker   Apple            MacOS
1               Eric Hull      PC  With Windows 10
2       Veronica Gonzales      PC  With Windows 11
3  Mrs. Krista Richardson   Apple            MacOS
4              Anne Craig      PC  With Windows 10
5            Joseph Hayes      PC  With Windows 10
6             Mary Nelson   Apple            MacOS
7               Jill Hunt   Apple            MacOS
8             Mark Taylor      PC  With Windows 11
9           Kyle Thompson      PC  With Windows 10

Answered By - TimTeaFan

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, August 1, 2022

[FIXED] How do I add data to a column only if a certain value exists in previous column using Python and Faker?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels