Issue
I have two example dataframes:
df1 = pd.DataFrame()
df1['a1'] = ['ABC','ACC','BCC','ABC']
df1['b1'] = ['ACC','AAC','BAC','ACC']
df2 = pd.DataFrame()
df2['a2'] = ['ACC','BCC','ABC']
df2['b2'] = ['AAC','BAC','ACC']
df2['types'] = [t1,t2,t3]
>>> df2
a2 b2 types
0 ACC AAC t1
1 BCC BAC t2
2 ABC ACC t3
>>> df1
a1 a2
0 ABC ACC
1 ACC AAC
2 BCC BAC
3 CCC CAC
I want to take a row from df1 and iterate through the df2 looking for matches. If a1 match a2 AND b1 match b2, then I want to count the type for calculating the probability of each type. For example, for the first row of df1, it matches the third row of df2, so I count t3 +1. I want to find an efficient way when there are more data
I tried:
for ind in df1:
compare_item1= df1['a1'][ind]
compare_item2 = df1['b1'][ind]
for i in df2:
count = 0
if compare_item1 == df2['a2'][i] and compare_item2 == df2['b2'][i]:
df1['t_{}'.format(i)]= count+1
what I thought is that for each iteration, create a dummy variables t_i and then I can do count and further calculations. However, I don’t get expected df1 with dummy variable. Any suggestion on how to fix it? Or any more efficient way to find probability?
Thanks!
Solution
IIUC use:
df = df1.merge(df2, left_on=['a1','b1'], right_on=['a2','b2'])
print (df)
a1 b1 a2 b2 types
0 ABC ACC ABC ACC t3
1 ABC ACC ABC ACC t3
2 ACC AAC ACC AAC t1
3 BCC BAC BCC BAC t2
df = df.groupby(['a1','b1','types']).size().reset_index(name='count')
print (df)
a1 b1 types count
0 ABC ACC t3 2
1 ACC AAC t1 1
2 BCC BAC t2 1
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.