Issue
The data I have in comes in as a string data type, from that string I want to extract some dates that are relevant at prediction time.
An example would be:
Date | City |
---|---|
2015-07-12 | Barcelona |
2015-07-13 | Brussels |
And I want it to be
Day | Month |
---|---|
7 | 12 |
7 | 13 |
etc.
from sklearn.base import BaseEstimator, TransformerMixin
class DateTransformer(BaseEstimator, TransformerMixin):
def fit(self, X, y=None):
return self
def transform(Self, X, y = None):
X.date = pd.to_datetime(X.date)
X["year"] = X.date.dt.year
X["month"] = X.date.dt.month
X["day"] = X.date.dt.day
X["dow"] = X.date.dt.dayofweek
X["quarter"] = X.date.dt.quarter
X = X.drop("date", axis=1)
X = X.astype(str)
return X
It is used in the following pipeline
naiveBaseline = Pipeline([
('dates', DateTransformer()),
('onehot', OneHotEncoder()),
('regression', RidgeCV())
])
When used like this the Pipeline actually works:
naiveBaseline2.fit(X_train,y_train)
naiveBaseline2.predict(X_test)
But ideally I use the following function to benchmark the performance of multiple models:
def evaluate(model, X, y, cv):
cv_results = cross_validate(
model,
X,
y,
cv=cv,
scoring=["neg_mean_absolute_error", "neg_root_mean_squared_error"],
)
mae = -cv_results["test_neg_mean_absolute_error"]
rmse = -cv_results["test_neg_root_mean_squared_error"]
print(
f"Mean Absolute Error: {mae.mean():.3f} +/- {mae.std():.3f}\n"
f"Root Mean Squared Error: {rmse.mean():.3f} +/- {rmse.std():.3f}"
)
If I run it through this evaluate function as such: evaluate(naiveBaseline2, X, y, TimeSeriesSplit())
I simply get NaN's for both metrics. I've been going over it for hours and really can't seem to understand what is going wrong, would any of you know?
Solution
Solved it, the issue was wiht the oneHotEncoder, while using TimeSeriesSplit()
and walk forward cross-validation it doesn't get to see all date labels (e.g. 2020 in the first split) so it threw an error. The oneHotEncoder needs the parameter handle_unknown=True
.
Answered By - Zestar75
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.