Issue
I trained a set of LinearRegression models using the following GridSearchCV
MAX_COLUMNS=list(range(2, len(house_df.columns)))
X = house_df.drop(columns=['SalePrice'])
y = house_df.loc[:, 'SalePrice']
column_list = MAX_COLUMNS
# Box-cox transform the target
reg_strategy = TransformedTargetRegressor()
bcox_transformer = PowerTransformer(method='box-cox')
model_pipeline = Pipeline([("std_scaler", StandardScaler()),
('feature_selector', SelectKBest()),
('regress', reg_strategy)])
parameter_grid = [{'feature_selector__k' : column_list,
'feature_selector__score_func' : [f_regression, mutual_info_regression],
'regress__regressor' : [LinearRegression()],
'regress__regressor__fit_intercept' : [True],
'regress__transformer' : [None, bcox_transformer]}]
score_types = {'MSE' : 'neg_mean_squared_error', 'r2' : 'r2'}
gs = GridSearchCV(estimator=model_pipeline, param_grid=parameter_grid, scoring=score_types, refit='MSE', cv=5, n_jobs=5, verbose=1)
gs.fit(X, y)
PATH = './datasets/processed_data/'
gridsearch_result_filename = 'pfY_np10_nt2_rfS_ct0_8_st1_orY_ccY_LR1_GS.pkl'
full_path = PATH + gridsearch_result_filename
with open(full_path, 'wb') as file:
pickle.dump(gs, file)
I then load the trained GridSearch and can make predictions using the best estimator as follows:
with open(MODEL_PATH, 'rb') as file:
gs_results = pickle.load(file)
predictions = gs_results.predict(test_df)
The problem I am facing is that since the Box-Cox transform was applied during GridSearch, all of my predictions are in the Box-Cox transformed distributions domain (huge values).
I need to use the PowerTransformers inverse_transform() method on my predictions, but I am not sure how to access it.
I can get the entire pipeline for the best estimator like this
gs_results.best_estimator_
I can then access the TransformedTargetRegressor inside the pipeline like this:
Taking a step further, we get all the way to the PowerTransformer inside the TransformedTargetRegressor like this:
After making it here, I foolishly thought I had made it where I needed to be, and simply needed to use the PowerTransformers inverse_transform method to make predictions that would be back in the original units. However, much to my disappointment, an error is thrown:
The error seems pretty clear, telling me I cannot use the inverse_transform method because the PowerTransformer has not been fit.
This is where I am stumped. It doesn't make sense to say the PowerTransformer has not been fit, when clearly it was fit during the GridSearch process.
This makes me think I am simply accessing the PowerTransformer incorrectly, which is my current question.
Based on the set up above, does anyone know the correct way to take the inverse transform of my predictions so they are in the original units rather than the Box-Cox distributions units?
I have been banging my head against the wall for this and have searched all over for a similar question. Thank you so much in advance!
-Braden
Solution
Much like here, the attribute transformer
is the unfitted initialization attribute; you need the fitted transformer_
attribute.
However, I'm not sure why predict
doesn't already do what you want; the documentation for TransformedTargetRegressor.predict
says
Predict using the base regressor, applying inverse.
The regressor is used to predict and the
inverse_func
orinverse_transform
is applied before returning the prediction.
Answered By - Ben Reiniger
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.