Issue
I have a some data and in the column "sex" it is listed as Male or Female, when this data is translated onto Google Colab it conveys all of the data as NaN in the row "sex".
I was wondering if there was a way that I can get this data to represent 0 for Male and 1 for Female. I have tried using the replace function, however I keep getting the same error as shown in the image.
Code/Error:
Data:
Solution
Just to reproduce the sample data as yours and explained in way forward to parse it to get the desired outcome:
#!/home/Karn_python3/bin/python
from __future__ import (absolute_import, division, print_function)
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('max_colwidth', None)
pd.set_option('expand_frame_repr', False)
# Read CSV and create dataframe.
df = pd.read_csv('adult_test.csv')
# It appears as your column name might have spaces around it, so let's trim them first.
# first to avoid any mapping/processing issues of data
df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
# Create a dictionary and map that to the desired column, which is easy and
# faster than replace.
m = {'Male': 0, 'Female': 1}
# As there may be Nan values so, better to fill them with int values
# whatever you like as used fillna & used 0 and convert the dtype to int
# otherwise you will get it float.
df['Sex'] = df['Sex'].map(m).fillna(0).astype(int)
print(df.head(20))
Resulted Output:
Age Workclass fnlwgt Education Education_Num Martial_Status Occupation Relationship Race Sex Capital_Gain Capital_Loss Hours_per_week Country Target
0 |1x3 Cross validator NaN NaN NaN NaN NaN NaN NaN NaN 0 NaN NaN NaN NaN NaN
1 25 Private 226802.0 11th 7.0 Never-married Machine-op-inspct Own-child Black 0 0.0 0.0 40.0 United-States <=50K.
2 38 Private 89814.0 HS-grad 9.0 Married-civ-spouse Farming-fishing Husband White 0 0.0 0.0 50.0 United-States <=50K.
3 28 Local-gov 336951.0 Assoc-acdm 12.0 Married-civ-spouse Protective-serv Husband White 0 0.0 0.0 40.0 United-States >50K.
4 44 Private 160323.0 Some-college 10.0 Married-civ-spouse Machine-op-inspct Husband Black 0 7688.0 0.0 40.0 United-States >50K.
5 18 NaN 103497.0 Some-college 10.0 Never-married NaN Own-child White 1 0.0 0.0 30.0 United-States <=50K.
6 34 Private 198693.0 10th 6.0 Never-married Other-service Not-in-family White 0 0.0 0.0 30.0 United-States <=50K.
7 29 NaN 227026.0 HS-grad 9.0 Never-married NaN Unmarried Black 0 0.0 0.0 40.0 United-States <=50K.
8 63 Self-emp-not-inc 104626.0 Prof-school 15.0 Married-civ-spouse Prof-specialty Husband White 0 3103.0 0.0 32.0 United-States >50K.
9 24 Private 369667.0 Some-college 10.0 Never-married Other-service Unmarried White 1 0.0 0.0 40.0 United-States <=50K.
10 55 Private 104996.0 7th-8th 4.0 Married-civ-spouse Craft-repair Husband White 0 0.0 0.0 10.0 United-States <=50K.
11 65 Private 184454.0 HS-grad 9.0 Married-civ-spouse Machine-op-inspct Husband White 0 6418.0 0.0 40.0 United-States >50K.
12 36 Federal-gov 212465.0 Bachelors 13.0 Married-civ-spouse Adm-clerical Husband White 0 0.0 0.0 40.0 United-States <=50K.
13 26 Private 82091.0 HS-grad 9.0 Never-married Adm-clerical Not-in-family White 1 0.0 0.0 39.0 United-States <=50K.
14 58 NaN 299831.0 HS-grad 9.0 Married-civ-spouse NaN Husband White 0 0.0 0.0 35.0 United-States <=50K.
15 48 Private 279724.0 HS-grad 9.0 Married-civ-spouse Machine-op-inspct Husband White 0 3103.0 0.0 48.0 United-States >50K.
16 43 Private 346189.0 Masters 14.0 Married-civ-spouse Exec-managerial Husband White 0 0.0 0.0 50.0 United-States >50K.
17 20 State-gov 444554.0 Some-college 10.0 Never-married Other-service Own-child White 0 0.0 0.0 25.0 United-States <=50K.
18 43 Private 128354.0 HS-grad 9.0 Married-civ-spouse Adm-clerical Wife White 1 0.0 0.0 30.0 United-States <=50K.
19 37 Private 60548.0 HS-grad 9.0 Widowed Machine-op-inspct Unmarried White 1 0.0 0.0 20.0 United-States <=50K.
Just to be better organized the Data:
As we have Nan
values as well, so better we incorporate them within the dict
like m = {'Male': 0, 'Female': 1, np.nan: 0}
so, we can map all of them altogether rather using fillna
later.
df = pd.read_csv('adult_test.csv')
df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
m = {'Male': 0, 'Female': 1, np.nan: 0}
df['Sex'] = df['Sex'].map(m)
print(df.head(20))
Another Solution with replace
:
Just using replace
while using the dict
again ...
df = pd.read_csv('adult_test.csv')
df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
m = {'Male': 0, 'Female': 1, np.nan: 0}
df = df.replace({'Sex': m})
print(df.head(20))
Refer to @jpp's answer here Replace values in a pandas series via dictionary efficiently
Answered By - Karn Kumar
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.