Issue
I found code that should work from NameError: name 'Series' is not defined
But I get an error "name 'Series' is not defined". It worked fine in the example, but this error did come up for other users as well. Does anyone know how to make it work?
Any help would be appreciated!
original_df = DataFrame([{'country': 'a', 'title': 'title1'},
{'country': 'a,b,c', 'title': 'title2'},
{'country': 'd,e,f', 'title': 'title3'},
{'country': 'e', 'title': 'title4'}])
desired_df = DataFrame([{'country': 'a', 'title': 'title1'},
{'country': 'a', 'title': 'title2'},
{'country': 'b', 'title': 'title2'},
{'country': 'c', 'title': 'title2'},
{'country': 'd', 'title': 'title3'},
{'country': 'e', 'title': 'title3'},
{'country': 'f', 'title': 'title3'},
{'country': 'e', 'title': 'title4'}])
#Code I used:
desired_df = pd.concat(
[
Series(row["title"], row["country"].split(","))
for _, row in original_df.iterrows()
]
).reset_index()
Solution
First split
the column on commas to get a list and then you can explode
that Series of lists. Move 'title'
to the index so it gets repeated for each element in 'country'
. The last two parts just clean up the names and remove title from the index.
(df.set_index('title')['country']
.str.split(',')
.explode()
.rename('country')
.reset_index())
title country
0 title1 a
1 title2 a
2 title2 b
3 title2 c
4 title3 d
5 title3 e
6 title3 f
7 title4 e
Also, your original code is logically fine, but you need to properly create your object. I would recommend importing the module instead of individual classes/methods, so you create a Series
with pd.Series
not Series
import pandas as pd
desired_df = pd.concat([pd.Series(row['title'], row['country'].split(','))
for _, row in original_df.iterrows()]).reset_index()
Answered By - ALollz
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.