Issue
I have a third column in my data frame where I want to be able to create a fourth column that looks almost the same, except it has no double quotes and there is a 'user/' prefix before each ID in the list. Also, sometimes it is just a single ID vs. list of IDs (as shown in example DF).
original
col1 col2 col3
01 01 "ID278, ID289"
02 02 "ID275"
desired
col1 col2 col3 col4
01 01 "ID278, ID289" user/ID278, user/ID289
02 02 "ID275" user/ID275
Solution
Given:
col1 col2 col3
0 1.0 1.0 "ID278, ID289"
1 2.0 2.0 "ID275"
2 2.0 1.0 NaN
Doing:
df['col4'] = (df.col3.str.strip('"') # Remove " from both ends.
.str.split(', ') # Split into lists on ', '.
.apply(lambda x: ['user/' + i for i in x if i] # Apply this list comprehension,
if isinstance(x, list) # If it's a list.
else x)
.str.join(', ')) # Join them back together.
print(df)
Output:
col1 col2 col3 col4
0 1.0 1.0 "ID278, ID289" user/ID278, user/ID289
1 2.0 2.0 "ID275" user/ID275
2 2.0 1.0 NaN NaN
Answered By - BeRT2me
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.