Thursday, June 9, 2022

[FIXED] duplicating rows by splitting comma separated multiple values in another column pandas

June 09, 2022 pandas, python, split No comments

Issue

I found code that should work from NameError: name 'Series' is not defined

But I get an error "name 'Series' is not defined". It worked fine in the example, but this error did come up for other users as well. Does anyone know how to make it work?

Any help would be appreciated!

original_df = DataFrame([{'country': 'a', 'title': 'title1'},
               {'country': 'a,b,c', 'title': 'title2'},
               {'country': 'd,e,f', 'title': 'title3'},
               {'country': 'e', 'title': 'title4'}])

desired_df = DataFrame([{'country': 'a', 'title': 'title1'},
               {'country': 'a', 'title': 'title2'},
               {'country': 'b', 'title': 'title2'},
               {'country': 'c', 'title': 'title2'},
               {'country': 'd', 'title': 'title3'},
               {'country': 'e', 'title': 'title3'},
               {'country': 'f', 'title': 'title3'},
               {'country': 'e', 'title': 'title4'}])

#Code I used:
desired_df = pd.concat(
    [
        Series(row["title"], row["country"].split(","))
        for _, row in original_df.iterrows()
    ]
).reset_index()

Solution

First split the column on commas to get a list and then you can explode that Series of lists. Move 'title' to the index so it gets repeated for each element in 'country'. The last two parts just clean up the names and remove title from the index.

(df.set_index('title')['country']
   .str.split(',')
   .explode()
   .rename('country')
   .reset_index())

    title country
0  title1       a
1  title2       a
2  title2       b
3  title2       c
4  title3       d
5  title3       e
6  title3       f
7  title4       e

Also, your original code is logically fine, but you need to properly create your object. I would recommend importing the module instead of individual classes/methods, so you create a Series with pd.Series not Series

import pandas as pd
                
desired_df = pd.concat([pd.Series(row['title'], row['country'].split(','))              
                        for _, row in original_df.iterrows()]).reset_index()

Answered By - ALollz

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, June 9, 2022

[FIXED] duplicating rows by splitting comma separated multiple values in another column pandas

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels