Issue
I have been scratching my head since morning on how to tackle this problem. I need to pass multiple columns from multiple dataframes to a function as arguments.
Example : Df1
A | B | C |
---|---|---|
1 | 11 | 111,333 |
2 | 22 | 222 |
3 | 33 | nan |
Df2
D | E |
---|---|
a | 111 |
b | 333 |
Now, I want all the rows from Df2(col E) which are not present in the Df1 (Col C) along with DF1(A,B) concatenated.
The output I want is below:
newcol | C |
---|---|
2_22 | 222 |
I have written below code snippet to do the same but I'm stuck at how to apply it to the dataframes.
def get_wrong_ls(a,b,c,d):
wrong_l_list = []
wrong_l_dict = {}
if c is not None:
c_list = str(c).split(",")
for ele in c_list:
if ele not in d:
wrong_l_dict[str(a)+"_"+str(b)] = c
wrong_l_list.append(wrong_l_dict)
return pd.DataFrame(wrong_l_list)
it returns correct output when the strings are passed. But I'm stuck at how to pass the dataframes directly. The below code behaves weirdly and also the "nan" in the col C doesn't get filtered out.
fd = get_wrong_ls(df1['A'],df1['B'],df1['C'],df2['E'])
Please help.
Solution
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [11, 22, 33], 'C': ['111,333', '222', np.nan]})
df2 = pd.DataFrame({'D': ['a', 'b'], 'E': [111, 333]})
def get_wrong_ls(df1, df2):
df1.dropna(inplace=True)
df1['C'] = df1['C'].str.split(',')
df1 = df1.explode('C').astype(int)
out = df1[~df1['C'].isin(df2['E'])].copy()
out['newcol'] = out['A'].astype(str) + '_' + out['B'].astype(str)
return out[['newcol','C']]
get_wrong_ls(df1,df2)
Output
newcol C
1 2_22 222
Answered By - Chris
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.