Issue
Imagine I have a dataframe like this: With lists of elements in a single string.
data = {'Col1': ["apple, banana, orange", "dog, cat", "python, java, c++"],
'Col2': ["banana, lemon, blueberry", "bird, cat", "R, fortran"]
}
df = pd.DataFrame(data)
df
How can I create a Col3 with the intersection of elements in Col1 and Col2
Expected output:
data = {'Col1': ["apple, banana, orange", "dog, cat", "python, java, c++"],
'Col2': ["banana, lemon, blueberry", "bird, cat", "R, fortran"],
'Col3': ["banana", "cat", NA]
}
df = pd.DataFrame(data)
df
Solution
Using a list comprehension and set
intersection:
df['Col3'] = [', '.join(set(a.split(', ')) & set(b.split(', ')))
for a,b in zip(df['Col1'], df['Col2'])]
Output:
Col1 Col2 Col3
0 apple, banana, orange banana, lemon, blueberry banana
1 dog, cat bird, cat cat
2 python, java, c++ R, fortran
If you want NAs on empty intersections:
df['Col3'] = [x if (x:=', '.join(set(a.split(', ')) & set(b.split(', '))))
else pd.NA
for a,b in zip(df['Col1'], df['Col2'])]
Output:
Col1 Col2 Col3
0 apple, banana, orange banana, lemon, blueberry banana
1 dog, cat bird, cat cat
2 python, java, c++ R, fortran <NA>
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.