Saturday, January 29, 2022

[FIXED] How to use inverse_transform for a Scikit-Learn PowerTransformer() set as transformer param in TransformedTargetRegressor in a pipe in GridSearchCV

January 29, 2022 machine-learning, pandas, python, scikit-learn No comments

Issue

I trained a set of LinearRegression models using the following GridSearchCV

MAX_COLUMNS=list(range(2, len(house_df.columns)))

X = house_df.drop(columns=['SalePrice'])
y = house_df.loc[:, 'SalePrice']

column_list = MAX_COLUMNS

# Box-cox transform the target 
reg_strategy = TransformedTargetRegressor()
bcox_transformer = PowerTransformer(method='box-cox')


model_pipeline = Pipeline([("std_scaler", StandardScaler()),
                           ('feature_selector', SelectKBest()),
                           ('regress', reg_strategy)])


parameter_grid = [{'feature_selector__k' : column_list,
                   'feature_selector__score_func' : [f_regression, mutual_info_regression],
                   'regress__regressor' : [LinearRegression()],
                   'regress__regressor__fit_intercept' : [True],
                   'regress__transformer' : [None, bcox_transformer]}]


score_types = {'MSE' : 'neg_mean_squared_error', 'r2' : 'r2'}

gs = GridSearchCV(estimator=model_pipeline, param_grid=parameter_grid, scoring=score_types, refit='MSE', cv=5, n_jobs=5, verbose=1)

gs.fit(X, y)

PATH = './datasets/processed_data/'
gridsearch_result_filename = 'pfY_np10_nt2_rfS_ct0_8_st1_orY_ccY_LR1_GS.pkl'
full_path = PATH + gridsearch_result_filename
with open(full_path, 'wb') as file:
    pickle.dump(gs, file)

I then load the trained GridSearch and can make predictions using the best estimator as follows:

with open(MODEL_PATH, 'rb') as file:
    gs_results = pickle.load(file)


predictions = gs_results.predict(test_df)

The problem I am facing is that since the Box-Cox transform was applied during GridSearch, all of my predictions are in the Box-Cox transformed distributions domain (huge values).

I need to use the PowerTransformers inverse_transform() method on my predictions, but I am not sure how to access it.

I can get the entire pipeline for the best estimator like this

gs_results.best_estimator_

I can then access the TransformedTargetRegressor inside the pipeline like this:

Taking a step further, we get all the way to the PowerTransformer inside the TransformedTargetRegressor like this:

After making it here, I foolishly thought I had made it where I needed to be, and simply needed to use the PowerTransformers inverse_transform method to make predictions that would be back in the original units. However, much to my disappointment, an error is thrown:

The error seems pretty clear, telling me I cannot use the inverse_transform method because the PowerTransformer has not been fit.

This is where I am stumped. It doesn't make sense to say the PowerTransformer has not been fit, when clearly it was fit during the GridSearch process.

This makes me think I am simply accessing the PowerTransformer incorrectly, which is my current question.

Based on the set up above, does anyone know the correct way to take the inverse transform of my predictions so they are in the original units rather than the Box-Cox distributions units?

I have been banging my head against the wall for this and have searched all over for a similar question. Thank you so much in advance!

-Braden

Solution

Much like here, the attribute transformer is the unfitted initialization attribute; you need the fitted transformer_ attribute.

However, I'm not sure why predict doesn't already do what you want; the documentation for TransformedTargetRegressor.predict says

Predict using the base regressor, applying inverse.

The regressor is used to predict and the inverse_func or inverse_transform is applied before returning the prediction.

Answered By - Ben Reiniger

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, January 29, 2022

[FIXED] How to use inverse_transform for a Scikit-Learn PowerTransformer() set as transformer param in TransformedTargetRegressor in a pipe in GridSearchCV

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels