Issue
I'm pretty new to Python and not sure what to even google for this. What I am trying to do is create a Pandas DataFrame that is filled with fake data by using Faker. The problem I am having is each column is generating fake data in a silo. I want to be able to have fake data created based on something that exists in a prior column.
So in my example below, I have pc_type ["PC", "Apple]
From there I have the operating system and the options are Windows 10, Windows 11, and MacOS. Now I want only where pc_type = "Apple"
to have the columns fill with the value of MacOS. Then for everything that is type PC, it's 50% Windows 10 and 50% Windows 11.
How would I write this code so that in the function body I can make that distinction clear and the results will reflect that?
from faker import Faker
from faker.providers import BaseProvider, DynamicProvider
import numpy as np
import pandas as pd
from datetime import datetime
import random
pc_type = ['PC', 'Apple']
fake = Faker()
def create_data(x):
project_data = {}
for i in range(0, x):
project_data[i] = {}
project_data[i]['Name'] = fake.name()
project_data[i]['PC Type'] = fake.random_element(pc_type)
project_data[i]['With Windows 10'] = fake.boolean(chance_of_getting_true=25)
project_data[i]['With Windows 11 '] = fake.boolean(chance_of_getting_true=25)
project_data[i]['With MacOS'] = fake.boolean(chance_of_getting_true=50)
return project_data
df = pd.DataFrame(create_data(10)).transpose()
df
Solution
I'd slightly change the approach and generate a column OS
. This column you can then transform into With MacOS
etc. if needed.
With this approach its easier to get the 0.5 / 0.5 split within Windows right:
from faker import Faker
from faker.providers import BaseProvider, DynamicProvider
import numpy as np
import pandas as pd
from datetime import datetime
import random
from collections import OrderedDict
pc_type = ['PC', 'Apple']
wos_type = OrderedDict([('With Windows 10', 0.5), ('With Windows 11', 0.5)])
fake = Faker()
def create_data(x):
project_data = {}
for i in range(x):
project_data[i] = {}
project_data[i]['Name'] = fake.name()
project_data[i]['PC Type'] = fake.random_element(pc_type)
if project_data[i]['PC Type'] == 'PC':
project_data[i]['OS'] = fake.random_element(elements = wos_type)
else:
project_data[i]['OS'] = 'MacOS'
return project_data
df = pd.DataFrame(create_data(10)).transpose()
df
Output
Name PC Type OS
0 Nicholas Walker Apple MacOS
1 Eric Hull PC With Windows 10
2 Veronica Gonzales PC With Windows 11
3 Mrs. Krista Richardson Apple MacOS
4 Anne Craig PC With Windows 10
5 Joseph Hayes PC With Windows 10
6 Mary Nelson Apple MacOS
7 Jill Hunt Apple MacOS
8 Mark Taylor PC With Windows 11
9 Kyle Thompson PC With Windows 10
Answered By - TimTeaFan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.