Issue
My objective is to convert a list of dates to the correct UTC format, i.e.: taking into consideration winter and summer time. Dates were scrapped from a chat.
I am Switzerland based which uses (CET) UTC+01:00 during the winter and (CEST) Central European Summer Time during the summer i.e. UTC+02:00.
Here are the dates:
import pandas as pd
df = pd.DataFrame()
df['dates_raw'] = [
'2022-01-20 01:12:15',
'2022-06-22 12:00:00',
'2022-10-29 05:57:02',
'2022-12-18 09:34:17',
'2023-01-12 06:36:10',
'2023-02-17 20:23:10',
'2023-04-12 02:02:24',
'2023-09-12 15:57:35',]
And here the desired result:
df['dates_converted'] = [
'2022-01-20 00:12:15',
'2022-06-22 10:00:00',
'2022-10-29 03:57:02',
'2022-12-18 08:34:17',
'2023-01-12 05:36:10',
'2023-02-17 19:23:10',
'2023-04-12 00:02:24',
'2023-09-12 13:57:35',]
And finally, here are the dates changes from summer to winter to summer... etc. etc.:
dates_changes = {
'st_2022' : '2022-03-27 02:00:00', # UTC + 2 (we gain one hour)
'wt_2022' : '2022-10-30 03:00:00', # UTC + 1 (we loose one hour)
'st_2023' : '2023-03-26 02:00:00', # UTC + 2 (we gain one hour)
'wt_2023' : '2023-10-29 03:00:00', # UTC + 1 (we loose one hour)
'st_2024' : '2024-03-31 02:00:00', # UTC + 2
'wt_2024' : '2024-10-27 03:00:00', # UTC + 1
}
As the dates changes look arbitrary, I don't know if there is any built-in function to make the conversion.
Many thanks in advance!
Solution
This can be done by first using pd.to_datetime
to convert the strings to naive datetimes, then tz_localize
to interpret those naive datetimes as aware datetimes in your desired timezone, and finally tz_convert
to convert to UTC.
df['dates_converted'] = pd.to_datetime(df['dates_raw']).dt.tz_localize('Europe/Zurich').dt.tz_convert('UTC')
print(df['dates_converted'])
result:
0 2022-01-20 00:12:15+00:00
1 2022-06-22 10:00:00+00:00
2 2022-10-29 03:57:02+00:00
3 2022-12-18 08:34:17+00:00
4 2023-01-12 05:36:10+00:00
5 2023-02-17 19:23:10+00:00
6 2023-04-12 00:02:24+00:00
7 2023-09-12 13:57:35+00:00
Name: dates_converted, dtype: datetime64[ns, UTC]
This code as it stands will give an exception with some ambiguous datetimes. For example, what is 2022-10-30 02:30:00
? Is it 00:30 UTC or 01:30 UTC? - you can't tell. If your data contains such times, you can handle it by passing the ambiguous
arg to tz_localize
: check the Pandas documentation for the options here and choose the one that will best deal with your data.
Answered By - slothrop
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.