Issue
UPDATED: I've the following DataFrame:
df = pd.DataFrame({'sports': ["['soccer', 'men tennis']", "['soccer']", "['baseball', 'women tennis']"]})
print(df)
sports
0 ['soccer', 'men tennis']
1 ['soccer']
2 ['baseball', 'women tennis']
I need to extract all the unique sport names and put them into a list. I'm trying the following code:
out = pd.DataFrame(df['sports'].str.split(',').tolist()).stack()
out.value_counts().index
However, it's returning Nan
values.
Desired output:
['soccer', 'men tennis', 'baseball', 'women tennis']
What would be the smartest way of doing it? Any suggestions would be appreciated. Thanks!
Solution
If these are lists, then you could explode
+ unique
:
out = df['sports'].explode().unique().tolist()
If these are strings, then you could use ast.literal_eval
first to parse it:
import ast
out = df['sports'].apply(ast.literal_eval).explode().unique().tolist()
or use ast.literal_eval
in a set comprehension and unpack:
out = [*{x for lst in df['sports'].tolist() for x in ast.literal_eval(lst)}]
Output:
['soccer', 'men tennis', 'baseball', 'women tennis']
Answered By - enke
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.