Issue
I have a dataframe with rows like this, I want to go through the rows and see if there is " ; " in any of the column values if there is I want to split that row, if there are multiple columns in that row that have the same number of " ; " then the row should be split just once with the respective valeus. For the columns that do not have a " ; " I just want to duplicate the value.
I have tried this so far but it is leaving me with NaN instead of duplicating the value from the original row if there is no delimeter in that column.
data = {'ID': [34, 35],
'Name': ['Alt-Tempelhof Ecke Tempelhofer Damm', 'Alt-Wittenau'],
'Type': ['bus', 'bus'],
'Lines': ['A77,A68,A76', 'A62 ; A15,A21'],
'Coordinates': ['52.465964306830664, 13.38558297633417', '52.58972877186178, 13.334169215342472 ; 52.59166508975595, 13.326326895395114'],
'Extra': [None, 'Alt-Wittenau Ecke Oranienburger Straße ; Alt-Wittenau Ecke Eichborndamm']}
df = pd.DataFrame(data)
split_df = pd.concat([df[col].astype(str).str.split(';', expand=True).stack().str.strip() for col in df.columns], axis=1, keys=df.columns)
split_df = split_df.apply(lambda col: col.fillna(df[col.name]))
split_df.reset_index(drop=True, inplace=True)
split_df
Solution
IIUC, you can just group split_df
by the level 0 index and ffill
:
split_df = pd.concat([df[col].astype(str).str.split(';', expand=True).stack().str.strip() for col in df.columns], axis=1, keys=df.columns)
split_df = split_df.groupby(level=0).ffill().reset_index(drop=True)
Output:
ID Name Type Lines
0 34 Alt-Tempelhof Ecke Tempelhofer Damm bus A77,A68,A76 \
1 35 Alt-Wittenau bus A62
2 35 Alt-Wittenau bus A15,A21
Coordinates
0 52.465964306830664, 13.38558297633417 \
1 52.58972877186178, 13.334169215342472
2 52.59166508975595, 13.326326895395114
Extra
0 None
1 Alt-Wittenau Ecke Oranienburger Straße
2 Alt-Wittenau Ecke Eichborndamm
Answered By - Nick
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.