Issue
So I am trying to preprocess the 911.csv dataset from Kaggle (911 Calls) and I found that there are missing values in the Zip code (zip) column. I preprocessed the dataset a bit and found out what Township (twp) has missing zip values: 6 towns, to be specific.
The idea is to take the column Township, and corresponding to the towns that has 'nan' value, I would like to assign their respective zip code in the zip column.
It sounds simple but I've been banging my head against the wall over this a couple of hours now.
Please help. Thank you in advance!
Solution
You can groupby twp column and fillna in zip with value.
df["zip"] = df["zip"].fillna(df.groupby("twp")["zip"].transform("first"))
Answered By - Jason Baker
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.