Tuesday, January 9, 2024

[FIXED] left merge df2 onto df1 using id and date, but also porting over all id in df2 in df1

January 09, 2024 dataframe, pandas No comments

Issue

I have 2 dataframes, df1, df2. df1 has columns id, dt, date. df2 has columns id, date. I want to do the following merge:

merged = pd.merge(df1, df2, on=['id', 'date'], how='left')

however, I also want to make sure that for every dt in df1, it has all the df2.id.unique() and if it doesn't, i want to insert the corresponding id rows.

Is there a way to achieve this behavior using merge?

Example

# Note that dt and date are python datetime and date, resp. but for this example, we don't need to be concerned with that so i will use ints
df1 = pd.DataFrame({
  "id": [1, 2, 1, 2],
  "dt": [10.1, 10.1, 11.1, 11.1],
  "date": [10, 10, 11, 11]
})

# id = 3 doesn't exist in df1
df2 = pd.DataFrame({
  "id": [1, 2, 3, 1, 2, 3],
  "date": [10, 10, 10, 11, 11, 11],
  "some_other_col": [1.1, 1.2, 1.3, 1.4, 1.5, 1.6]
})

# After the merge I want to have:
merged = pd.DataFrame({
  "id": [1, 2, 3, 1, 2, 3],
  "dt": [10.1, 10.1, 10.1, 11.1, 11.1, 11.1],
  "date": [10, 10, 10, 11, 11, 11],
  "some_other_col": [1.1, 1.2, 1.3, 1.4, 1.5, 1.6]
})

Solution

one option is to call .loc after your merge:

merged = pd.merge(df1, df2, on=['id', 'date'], how='outer')
merged.loc[merged.dt.isna(), 'dt']=df1.dt.unique()
merged
   id    dt  date  some_other_col
0   1  10.1    10             1.1
1   2  10.1    10             1.2
2   1  11.1    11             1.4
3   2  11.1    11             1.5
4   3  10.1    10             1.3
5   3  11.1    11             1.6

Another option would be to introduce the missing rows into df1, before merging. This uses the complete function from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor
(df1
.complete({'id':df2.id.unique()}, ('dt', 'date'))
.merge(df2, on=['id','date'],how='left')
)
   id    dt  date  some_other_col
0   1  10.1    10             1.1
1   1  11.1    11             1.4
2   2  10.1    10             1.2
3   2  11.1    11             1.5
4   3  10.1    10             1.3
5   3  11.1    11             1.6

Answered By - sammywemmy

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, January 9, 2024

[FIXED] left merge df2 onto df1 using id and date, but also porting over all id in df2 in df1

Issue

Example

Solution

0 comments:

Post a Comment

Popular Posts

Labels