Issue
I created a DatetimeIndex from a "date" column:
sales.index = pd.DatetimeIndex(sales["date"])
Now the index looks as follows:
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-04', '2003-01-06',
'2003-01-07', '2003-01-08', '2003-01-09', '2003-01-10',
'2003-01-11', '2003-01-13',
...
'2016-07-22', '2016-07-23', '2016-07-24', '2016-07-25',
'2016-07-26', '2016-07-27', '2016-07-28', '2016-07-29',
'2016-07-30', '2016-07-31'],
dtype='datetime64[ns]', name='date', length=4393, freq=None)
As you see, the freq
attribute is None. I suspect that errors down the road are caused by the missing freq
. However, if I try to set the frequency explicitly:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-148-30857144de81> in <module>()
1 #### DEBUG
----> 2 sales_train = disentangle(df_train)
3 sales_holdout = disentangle(df_holdout)
4 result = sarima_fit_predict(sales_train.loc[5002, 9990]["amount_sold"], sales_holdout.loc[5002, 9990]["amount_sold"])
<ipython-input-147-08b4c4ecdea3> in disentangle(df_train)
2 # transform sales table to disentangle sales time series
3 sales = df_train[["date", "store_id", "article_id", "amount_sold"]]
----> 4 sales.index = pd.DatetimeIndex(sales["date"], freq="d")
5 sales = sales.pivot_table(index=["store_id", "article_id", "date"])
6 return sales
/usr/local/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
89 else:
90 kwargs[new_arg_name] = new_arg_value
---> 91 return func(*args, **kwargs)
92 return wrapper
93 return _deprecate_kwarg
/usr/local/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, dtype, **kwargs)
399 'dates does not conform to passed '
400 'frequency {1}'
--> 401 .format(inferred, freq.freqstr))
402
403 if freq_infer:
ValueError: Inferred frequency None from passed dates does not conform to passed frequency D
So apparently a frequency has been inferred, but is stored neither in the freq
nor inferred_freq
attribute of the DatetimeIndex - both are None. Can someone clear up the confusion?
Solution
You have a couple options here:
pd.infer_freq
pd.tseries.frequencies.to_offset
I suspect that errors down the road are caused by the missing freq.
You are absolutely right. Here's what I use often:
def add_freq(idx, freq=None):
"""Add a frequency attribute to idx, through inference or directly.
Returns a copy. If `freq` is None, it is inferred.
"""
idx = idx.copy()
if freq is None:
if idx.freq is None:
freq = pd.infer_freq(idx)
else:
return idx
idx.freq = pd.tseries.frequencies.to_offset(freq)
if idx.freq is None:
raise AttributeError('no discernible frequency found to `idx`. Specify'
' a frequency string with `freq`.')
return idx
An example:
idx=pd.to_datetime(['2003-01-02', '2003-01-03', '2003-01-06']) # freq=None
print(add_freq(idx)) # inferred
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq='B')
print(add_freq(idx, freq='D')) # explicit
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq='D')
Using asfreq
will actually reindex (fill) missing dates, so be careful of that if that's not what you're looking for.
The primary function for changing frequencies is the
asfreq
function. For aDatetimeIndex
, this is basically just a thin, but convenient wrapper aroundreindex
which generates adate_range
and callsreindex
.
Answered By - Brad Solomon
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.