Issue
I have a pandas dataFrame of mixed types, some are strings and some are numbers. I would like to replace the NAN values in string columns by '.', and the NAN values in float columns by 0.
Consider this small fictitious example:
df = pd.DataFrame({'Name':['Jack','Sue',pd.np.nan,'Bob','Alice','John'],
'A': [1, 2.1, pd.np.nan, 4.7, 5.6, 6.8],
'B': [.25, pd.np.nan, pd.np.nan, 4, 12.2, 14.4],
'City':['Seattle','SF','LA','OC',pd.np.nan,pd.np.nan]})
Now, I can do it in 3 lines:
df['Name'].fillna('.',inplace=True)
df['City'].fillna('.',inplace=True)
df.fillna(0,inplace=True)
Since this is a small dataframe, 3 lines is probably ok. In my real example (which I cannot share here due to data confidentiality reasons), I have many more string columns and numeric columns. SO I end up writing many lines just for fillna. Is there a concise way of doing this?
Solution
You could use apply
for your columns with checking dtype
whether it's numeric
or not by checking dtype.kind
:
res = df.apply(lambda x: x.fillna(0) if x.dtype.kind in 'biufc' else x.fillna('.'))
print(res)
A B City Name
0 1.0 0.25 Seattle Jack
1 2.1 0.00 SF Sue
2 0.0 0.00 LA .
3 4.7 4.00 OC Bob
4 5.6 12.20 . Alice
5 6.8 14.40 . John
Answered By - Anton Protopopov
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.