Issue
I want to merge two data frames on Date Time column dtype.date-time columns contain both similar and different values. But I am unable to merge them such that all unique date-time rows are finally there..with NA in uncommon columns. I am getting NAs in date_time column for 2nd data frame. tried both in R and python
python code:
df=pd.merge(df_met, df_so2, how='left', on='Date_Time')
In R..data_type is date-time using as.POSIXct
df_2<-join(so2, met_km, type="inner")
df3 <- merge(so2, met_km, all = TRUE)
df_4 <- merge(so2, met_km, by.x = "Date_Time", by.y = "Date_Time")
df_so2:
X POC Datum Date_Time Date_GMT Sample.Measurement MDL
1 2 WGS84 2015-01-01 3:00 01/01/2015 09:00 2.3 0.2
2 2 WGS84 2015-01-01 4:00 01/01/2015 10:00 2.5 0.2
3 2 WGS84 2015-01-01 5:00 01/01/2015 11:00 2.1 0.2
4 2 WGS84 2015-01-01 6:00 01/01/2015 12:00 2.3 0.2
5 2 WGS84 2015-01-01 7:00 01/01/2015 13:00 1.1 0.2
df_met:
X Date_Time air_temp_set_1 dew_point_temperature_set_1
1 2015-01-01 1:00 35.6 35.6
2 2015-01-01 2:00 35.6 35.6
3 2015-01-01 3:00 35.6 35.6
4 2015-01-01 4:00 33.8 33.8
5 2015-01-01 5:00 33.2 33.2
6 2015-01-01 6:00 33.8 33.8
7 2015-01-01 7:00 33.8 33.8
Expected Output:
X POC Datum Date_Time Date_GMT Sample.Measurement MDL
1 1.0 2 WGS84 2015-01-01 3:00 01/01/2015 09:00 2.3 0.2
2 2.0 2 WGS84 2015-01-01 4:00 01/01/2015 10:00 2.5 0.2
3 NaN NaN 2015-01-01 1:00 NaN NaN NaN
4 NaN NaN 2015-01-01 2:00 NaN NaN NaN
Solution
merge on outer should get them all:
pandas.DataFrame.merge
outer
: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.- based upon your comment, you want all the dates, not just those shown in
Expected Output
- add the
parameter
,sort=True
if you want them sorted bydate
df_exp = pd.merge(df_so2, df_met, on='Date_Time', how='outer')
X_x POC Datum Date_Time Date_GMT Sample.Measurement MDL X_y air_temp_set_1 dew_point_temperature_set_1
1.0 2.0 WGS84 2015-01-01 3:00 01/01/2015 09:00 2.3 0.2 3 35.6 35.6
2.0 2.0 WGS84 2015-01-01 4:00 01/01/2015 10:00 2.5 0.2 4 33.8 33.8
3.0 2.0 WGS84 2015-01-01 5:00 01/01/2015 11:00 2.1 0.2 5 33.2 33.2
4.0 2.0 WGS84 2015-01-01 6:00 01/01/2015 12:00 2.3 0.2 6 33.8 33.8
5.0 2.0 WGS84 2015-01-01 7:00 01/01/2015 13:00 1.1 0.2 7 33.8 33.8
NaN NaN NaN 2015-01-01 1:00 NaN NaN NaN 1 35.6 35.6
NaN NaN NaN 2015-01-01 2:00 NaN NaN NaN 2 35.6 35.6
without columns from df_met
:
df_exp.drop(columns=['X_y', 'air_temp_set_1', 'dew_point_temperature_set_1'], inplace=True)
df_exp.rename(columns={'X_x': 'X'}, inplace=True)
X POC Datum Date_Time Date_GMT Sample.Measurement MDL
1.0 2.0 WGS84 2015-01-01 3:00 01/01/2015 09:00 2.3 0.2
2.0 2.0 WGS84 2015-01-01 4:00 01/01/2015 10:00 2.5 0.2
3.0 2.0 WGS84 2015-01-01 5:00 01/01/2015 11:00 2.1 0.2
4.0 2.0 WGS84 2015-01-01 6:00 01/01/2015 12:00 2.3 0.2
5.0 2.0 WGS84 2015-01-01 7:00 01/01/2015 13:00 1.1 0.2
NaN NaN NaN 2015-01-01 1:00 NaN NaN NaN
NaN NaN NaN 2015-01-01 2:00 NaN NaN NaN
Answered By - Trenton McKinney
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.