Issue
I have a dictionary that maps column names to a function name. I have wrote a function that should capitalize the values in the df column with str.title()
import pandas as pd
data= [["English","john","smith","ohio","united states","","","manufacturing","National","Residental","","",""]]
df= pd.DataFrame(data,columns=['Communication_Language__c','firstName', 'lastName', 'state', 'country', 'company', 'email', 'industry', 'System_Type__c', 'AccountType', 'customerSegment', 'Existing_Customer__c', 'GDPR_Email_Permission__c'])
Communication_Language__c firstName lastName state country company email industry System_Type__c AccountType customerSegment Existing_Customer__c GDPR_Email_Permission__c
0 English john smith ohio united states manufacturing National Residental
def capitalize (column,df_temp):
if df_temp[column].notna():
df_temp[column]=df[column].str.title()
return df_temp
def required ():
#somethin
Pass
parsing_map={
"firstName":[capitalize,required],
"lastName":capitalize,
"state":capitalize,
"country": [capitalize,required],
"industry":capitalize,
"System_Type__c":capitalize,
"AccountType":capitalize,
"customerSegment":capitalize,
}
i wrote the below to achieve the str title but is there a way to apply it to the df columns without naming them all
def capitalize (column,df_temp):
if df_temp[column].notna():
df_temp[column]=df[column].str.title()
return df_temp
What would be the best way to reference the dictionary function mapping to apply str.title()
to all of the contents in the columns with a function "capitalize"?
desired output
data= [["English","John","Smith","Ohio","United States","","","Manufacturing","National","Residental","","",""]]
df= pd.DataFrame(data,columns=['Communication_Language__c','firstName', 'lastName', 'state', 'country', 'company', 'email', 'industry', 'System_Type__c', 'AccountType', 'customerSegment', 'Existing_Customer__c', 'GDPR_Email_Permission__c'])
Communication_Language__c firstName lastName state country company email industry System_Type__c AccountType customerSegment Existing_Customer__c GDPR_Email_Permission__c
0 English John Smith Ohio United States Manufacturing National Residental
Solution
Normally you would use apply for this, e.g.
cols_to_capitalize = list(parsing_map.keys())
df[cols_to_capitalize] = df[cols_to_capitalize].apply(lambda x: x.str.title())
If you want to keep your method dictionary, I would suggest that you write the methods to act on a column, not on the dataframe. Something like this:
data= [["English","john","smith","ohio","united states","","","manufacturing","National","Residental","","",""]]
df= pd.DataFrame(data,columns=['Communication_Language__c','firstName', 'lastName', 'state', 'country', 'company', 'email', 'industry', 'System_Type__c', 'AccountType', 'customerSegment', 'Existing_Customer__c', 'GDPR_Email_Permission__c'])
def capitalize(col):
# TODO handle nan values
# Maybe use any() instead of all()?
# This code ignores any column that has even a single NaN value
if col.notna().all():
return col.str.title()
return col
def required(col):
# TODO do stuff
return col
parsing_map={
"firstName":[capitalize,required],
"lastName":[capitalize],
"state":[capitalize],
"country": [capitalize,required],
"industry":[capitalize],
"System_Type__c":[capitalize],
"AccountType":[capitalize],
"customerSegment":[capitalize],
}
for col_name, fns in parsing_map.items():
for fn in fns:
df[col_name] = fn(df[col_name])
You could also pass in the full df into these methods if they need to access other columns, but still returning only the single column would make the design clearer.
But you should think carefully whether you really need to reinvent the .apply
functionality.
Answered By - w-m
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.