Issue
I have 2 dataframes, df1, df2
. df1
has columns id, dt, date
. df2
has columns id, date
. I want to do the following merge:
merged = pd.merge(df1, df2, on=['id', 'date'], how='left')
however, I also want to make sure that for every dt
in df1
, it has all the df2.id.unique()
and if it doesn't, i want to insert the corresponding id
rows.
Is there a way to achieve this behavior using merge
?
Example
# Note that dt and date are python datetime and date, resp. but for this example, we don't need to be concerned with that so i will use ints
df1 = pd.DataFrame({
"id": [1, 2, 1, 2],
"dt": [10.1, 10.1, 11.1, 11.1],
"date": [10, 10, 11, 11]
})
# id = 3 doesn't exist in df1
df2 = pd.DataFrame({
"id": [1, 2, 3, 1, 2, 3],
"date": [10, 10, 10, 11, 11, 11],
"some_other_col": [1.1, 1.2, 1.3, 1.4, 1.5, 1.6]
})
# After the merge I want to have:
merged = pd.DataFrame({
"id": [1, 2, 3, 1, 2, 3],
"dt": [10.1, 10.1, 10.1, 11.1, 11.1, 11.1],
"date": [10, 10, 10, 11, 11, 11],
"some_other_col": [1.1, 1.2, 1.3, 1.4, 1.5, 1.6]
})
Solution
one option is to call .loc
after your merge:
merged = pd.merge(df1, df2, on=['id', 'date'], how='outer')
merged.loc[merged.dt.isna(), 'dt']=df1.dt.unique()
merged
id dt date some_other_col
0 1 10.1 10 1.1
1 2 10.1 10 1.2
2 1 11.1 11 1.4
3 2 11.1 11 1.5
4 3 10.1 10 1.3
5 3 11.1 11 1.6
Another option would be to introduce the missing rows into df1, before merging. This uses the complete function from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
(df1
.complete({'id':df2.id.unique()}, ('dt', 'date'))
.merge(df2, on=['id','date'],how='left')
)
id dt date some_other_col
0 1 10.1 10 1.1
1 1 11.1 11 1.4
2 2 10.1 10 1.2
3 2 11.1 11 1.5
4 3 10.1 10 1.3
5 3 11.1 11 1.6
Answered By - sammywemmy
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.