Issue
The goal is to merge three different dataframes having different timesteps (10min, 15min and 30min. The code must recognize what timestep to consider firstly and identify the next available next timestep. in This example 2019/04/02 10:40:00 does not exist in the dataframes dataset. Therefore the next timestep to consider after 2019/04/02 10:30:00 would be 2019/04/02 10:45:00.
df1:
Timestamp | data1 |
---|---|
2019/04/02 10:00:00 | 1 |
2019/04/02 10:10:00 | 1 |
2019/04/02 10:20:00 | 1 |
2019/04/02 10:30:00 | 1 |
df2:
Timestamp | data2 |
---|---|
2019/04/02 10:00:00 | 2 |
2019/04/02 10:15:00 | 22 |
2019/04/02 10:30:00 | 222 |
2019/04/02 10:45:00 | 2222 |
2019/04/02 11:00:00 | 22222 |
df3:
Timestamp | data3 |
---|---|
2019/04/02 10:00:00 | 3 |
2019/04/02 10:30:00 | 33 |
2019/04/02 11:00:00 | 333 |
2019/04/02 11:30:00 | 3333 |
desired result:
Timestamp | data1 | data2 | data3 |
---|---|---|---|
2019/04/02 10:00:00 | 1 | 2 | 3 |
2019/04/02 10:10:00 | 1 | NaN | NaN |
2019/04/02 10:15:00 | NaN | 22 | NaN |
2019/04/02 10:20:00 | 1 | NaN | NaN |
2019/04/02 10:30:00 | 1 | 222 | 33 |
2019/04/02 10:45:00 | NaN | 2222 | NaN |
2019/04/02 11:00:00 | NaN | 22222 | 333 |
2019/04/02 11:30:00 | NaN | NaN | 3333 |
I used the python concat function and the merge function but did not deliver he desired result.
Solution
# Convert Timestamp columns to datetime
df1['Timestamp'] = pd.to_datetime(df1['Timestamp'])
df2['Timestamp'] = pd.to_datetime(df2['Timestamp'])
df3['Timestamp'] = pd.to_datetime(df3['Timestamp'])
# Sort the DataFrames based on Timestamp
df1 = df1.sort_values('Timestamp')
df2 = df2.sort_values('Timestamp')
df3 = df3.sort_values('Timestamp')
# Merge using merge_asof
result = pd.merge_asof(df1, df2, on='Timestamp', direction='nearest')
result = pd.merge_asof(result, df3, on='Timestamp', direction='nearest')
EDIT: IF YOU WANT TO KEEP NAN VALUES
# Convert Timestamp columns to datetime
df1['Timestamp'] = pd.to_datetime(df1['Timestamp'])
df2['Timestamp'] = pd.to_datetime(df2['Timestamp'])
df3['Timestamp'] = pd.to_datetime(df3['Timestamp'])
# Merge using merge with how='outer'
result = pd.merge(df1, df2, on='Timestamp', how='outer')
result = pd.merge(result, df3, on='Timestamp', how='outer')
# Sort the result based on Timestamp
result = result.sort_values('Timestamp')
Answered By - Cem Koçak
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.