Issue
I connected to a table on a db where there are two columns with dates.
I had no problem to parse the column with values formatted as this: 2017-11-03
But I don't find a way to parse the other column with dates formatted as this: 2017-10-03 05:06:52.840 +02:00
My attempts
If I parse a single value through the strptime method
dt.datetime.strptime("2017-12-14 22:16:24.037 +02:00", "%Y-%m-%d %H:%M:%S.%f %z")
I get the correct output
datetime.datetime(2017, 12, 14, 22, 16, 24, 37000, tzinfo=datetime.timezone(datetime.timedelta(seconds=7200)))
but if I try to use the same code format while parsing the table to the dataframe, the column dtype is an object:
Licenze_FromGY = pd.read_sql(query, cnxn, parse_dates={"EndDate":"%Y-%m-%d", "LastUpd":"%Y-%m-%d %H:%M:%S.%f %z"})
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Tenant 1000 non-null int64
1 IdService 1000 non-null object
2 Code 1000 non-null object
3 Aggregate 1000 non-null object
4 Bundle 991 non-null object
5 Status 1000 non-null object
6 Value 1000 non-null int64
7 EndDate 258 non-null datetime64[ns]
8 Trial 1000 non-null bool
9 LastUpd 1000 non-null object
I also tried to change the code format either in the read_sql method or in the pd.to_datetime() method, but then all the values become NaT:
Licenze_FromGY["LastUpd"] = pd.to_datetime(Licenze_FromGY["LastUpd"], format="%Y-%m-%d %H:%M:%S.%fZ", errors="coerce")
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Tenant 1000 non-null int64
1 IdService 1000 non-null object
2 Code 1000 non-null object
3 Aggregate 1000 non-null object
4 Bundle 991 non-null object
5 Status 1000 non-null object
6 Value 1000 non-null int64
7 EndDate 258 non-null datetime64[ns]
8 Trial 1000 non-null bool
9 LastUpd 0 non-null datetime64[ns]
dtypes: bool(1), datetime64[ns](2), int64(2), object(5)
memory usage: 71.4+ KB
None
Anyone can help?
Solution
pandas cannot handle mixed UTC offsets in one Series (column). Assuming you have data like this
import pandas as pd
df = pd.DataFrame({"datetime": ["2017-12-14 22:16:24.037 +02:00",
"2018-08-14 22:16:24.037 +03:00"]})
if you just parse to datetime,
df["datetime"] = pd.to_datetime(df["datetime"])
df["datetime"]
0 2017-12-14 22:16:24.037000+02:00
1 2018-08-14 22:16:24.037000+03:00
Name: datetime, dtype: object
you get dtype object. The elements of the series are of the Python datetime.datetime dtype. That limits the datetime functionality, compared to the pandas datetime dtype.
You can get that e.g. by parsing to UTC:
df["datetime"] = pd.to_datetime(df["datetime"], utc=True)
df["datetime"]
0 2017-12-14 20:16:24.037000+00:00
1 2018-08-14 19:16:24.037000+00:00
Name: datetime, dtype: datetime64[ns, UTC]
You might set an appropriate time zone to re-create the UTC offset:
df["datetime"] = pd.to_datetime(df["datetime"], utc=True).dt.tz_convert("Europe/Athens")
df["datetime"]
0 2017-12-14 22:16:24.037000+02:00
1 2018-08-14 22:16:24.037000+03:00
Name: datetime, dtype: datetime64[ns, Europe/Athens]
Answered By - FObersteiner
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.