Issue
I have three different seismic catalogs with origin times calculated using different methods, naturally, the calculated values aren't exactly the same with an error of arround 5 seconds.
Catalog_1
Index Time
0 2022-05-01T08:16:55
1 2022-05-01T09:54:01
2 2022-05-01T10:25:49
3 2022-05-01T12:01:55
4 2022-05-01T18:17:23
Catalog_2
Index Time
0 2022-05-01T08:16:58.444
1 2022-05-01T10:25:46.939
2 2022-05-01T20:37:17.491
3 2022-05-01T23:34:22.539
Catalog_3
Index Time
0 2022-05-01T10:25:48
1 2022-05-01T23:34:20
2 2022-05-02T07:21:51
I want to combine these 3 dataframes into a single dataframe that automatically matches the origin times if they have the acceptable error.
Combined_catalog
Index Time_1 Time_2 Time_3
0 2022-05-01T08:16:55 2022-05-01T08:16:58.444 N/A
1 2022-05-01T09:54:01 N/A N/A
2 2022-05-01T10:25:49 2022-05-01T10:25:46.939 2022-05-01T10:25:48
3 2022-05-01T12:01:55 N/A N/A
4 2022-05-01T18:17:23 N/A N/A
5 N/A 2022-05-01T20:37:17.491 N/A
6 N/A 2022-05-01T23:34:22.539 2022-05-01T23:34:20
7 N/A N/A 2022-05-02T07:21:51
Is there a way to get a result similar to this witout using loops and if's?
Sometimes the catalogs have data from up to 5 years so it might be better to consider a different approach.
Solution
Below code is with assumption, when you said acceptable error, below Seconds are not considered while matching.
Code is a simple Merge of all the dataframes done with bit of a time format manipulation in Time columns
Main Code
Catalog_1 = pd.DataFrame({'Time':['2022-05-01T08:16:55','2022-05-01T09:54:01','2022-05-01T10:25:49',
'2022-05-01T12:01:55', '2022-05-01T18:17:23']})
Catalog_2 = pd.DataFrame({'Time':['2022-05-01T08:16:58.444','2022-05-01T10:25:46.939','2022-05-01T20:37:17.491',
'2022-05-01T23:34:22.539']})
Catalog_3 = pd.DataFrame({'Time':['2022-05-01T10:25:48','2022-05-01T23:34:20','2022-05-02T07:21:51']})
pd.merge(
pd.merge(Catalog_1,
Catalog_2,
left_on=pd.to_datetime(Catalog_1['Time']).dt.floor('min'),
right_on=pd.to_datetime(Catalog_2['Time']).dt.floor('min'),
how='outer'),
Catalog_2,
left_on='key_0',
right_on=pd.to_datetime(Catalog_2['Time']).dt.floor('min'),
how='outer'
).drop('key_0',axis=1).rename(columns={'Time_x':'Time_1','Time_y':'Time_2','Time':'Time_3'})
Output:
If you need HOUR level matching only use
pd.to_datetime(Catalog_1['Time']).dt.floor('H')
Answered By - Abhishek
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.