Issue
I got two dataframes looking like this:
df1
:
omg | odds | outcome | |
---|---|---|---|
0 | 2625 | 27 | 1 |
1 | 2625 | 22 | 1 |
2 | 2626 | 10 | 0 |
3 | 2628 | 15 | 1 |
4 | 2628 | 20 | 1 |
df2
:
omgangar | |
---|---|
0 | 2625 |
1 | 2626 |
2 | 2627 |
3 | 2628 |
4 | 2629 |
In df2
I'd like to add a new column that counts the occurrences of df2[omgangar] in df1[omg]
together with some other conditions (odds between two values and outcome equal to 1).
I've tried this code but it throws an error:
df2['outcome'] =
sum((df1['omg'] == df2['omgangar']) & (df1['odds'].between(20,30)) & (df1['outcome'] == 1))
Error:
ValueError: Can only compare identically-labeled Series objects
The desired output of df2
is:
omgangar | outcome | |
---|---|---|
0 | 2625 | 2 |
1 | 2626 | 0 |
2 | 2627 | 0 |
3 | 2628 | 1 |
4 | 2629 | 0 |
Solution
Use DataFrame.loc
for filter by mask with column omg
and get counts by Series.value_counts
, for new column use Series.map
with replace missing not matched values to 0
:
s = df1.loc[df1['odds'].between(20,30) & (df1['outcome'] == 1), 'omg'].value_counts()
df2['outcome'] = df2['omgangar'].map(s).fillna(0).astype(int)
print (df2)
omgangar outcome
0 2625 2
1 2626 0
2 2627 0
3 2628 1
4 2629 0
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.