Monday, December 11, 2023

[FIXED] Splitting dataframe rows where multiple columns have the same delimeter

December 11, 2023 pandas, python No comments

Issue

I have a dataframe with rows like this, I want to go through the rows and see if there is " ; " in any of the column values if there is I want to split that row, if there are multiple columns in that row that have the same number of " ; " then the row should be split just once with the respective valeus. For the columns that do not have a " ; " I just want to duplicate the value.

I have tried this so far but it is leaving me with NaN instead of duplicating the value from the original row if there is no delimeter in that column.


data = {'ID': [34, 35],
        'Name': ['Alt-Tempelhof Ecke Tempelhofer Damm', 'Alt-Wittenau'],
        'Type': ['bus', 'bus'],
        'Lines': ['A77,A68,A76', 'A62 ; A15,A21'],
        'Coordinates': ['52.465964306830664, 13.38558297633417', '52.58972877186178, 13.334169215342472 ; 52.59166508975595, 13.326326895395114'],
        'Extra': [None, 'Alt-Wittenau Ecke Oranienburger Straße ; Alt-Wittenau Ecke Eichborndamm']}

df = pd.DataFrame(data)


split_df = pd.concat([df[col].astype(str).str.split(';', expand=True).stack().str.strip() for col in df.columns], axis=1, keys=df.columns)

split_df = split_df.apply(lambda col: col.fillna(df[col.name]))

split_df.reset_index(drop=True, inplace=True)

split_df

Solution

IIUC, you can just group split_df by the level 0 index and ffill:

split_df = pd.concat([df[col].astype(str).str.split(';', expand=True).stack().str.strip() for col in df.columns], axis=1, keys=df.columns)
split_df = split_df.groupby(level=0).ffill().reset_index(drop=True)

Output:

   ID                                 Name Type        Lines
0  34  Alt-Tempelhof Ecke Tempelhofer Damm  bus  A77,A68,A76  \
1  35                         Alt-Wittenau  bus          A62
2  35                         Alt-Wittenau  bus      A15,A21

                             Coordinates
0  52.465964306830664, 13.38558297633417  \
1  52.58972877186178, 13.334169215342472
2  52.59166508975595, 13.326326895395114

                                    Extra
0                                    None
1  Alt-Wittenau Ecke Oranienburger Straße
2          Alt-Wittenau Ecke Eichborndamm

Answered By - Nick

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, December 11, 2023

[FIXED] Splitting dataframe rows where multiple columns have the same delimeter

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels