Saturday, May 21, 2022

[FIXED] Sklearn inverse_transform return only one column when fit to many

May 21, 2022 python, scikit-learn No comments

Issue

Is there a way to inverse_transform one column with sklearn, when the initial transformer was fit on the whole data set? Below is an example of what I am trying to get after.

import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler

# Setting up a dummy pipeline
pipes = []
pipes.append(('scaler', MinMaxScaler()))
transformation_pipeline = Pipeline(pipes)

# Random data.
df = pd.DataFrame(
    {'data1': [1, 2, 3, 1, 2, 3],
     'data2': [1, 1, 1, 2, 2, 2],
     'Y': [1, 4, 1, 2, 2, 2]
    }
)

# Fitting the transformation pipeline
test = transformation_pipeline.fit_transform(df)

# Pulling the scaler function from the pipeline.
scaler = transformation_pipeline.named_steps['scaler']

# This is what I thought may work.
predicted_transformed = scaler.inverse_transform(test['Y'])

# The output would look something like this
# Essentially overlooking that scaler was fit on 3 variables and fitting
# the last one, or any I need.
predicted_transfromed = [1, 4, 1, 2, 2, 2]

I need to be able to fit the whole dataset as part of a data prep process. But then I am importing the scaler later into another instance with sklearn.externals joblibs. In this new instance the predicted values are the only thing that exists. So I need to extract just the inverse scaler for the Y column to get back the originals.

I am aware that I could fit one transformer for X variables and Y variables, However, I would like to avoid this. This method would add to the complexity of moving the scalers around and maintaining both of them in future projects.

Solution

A bit late but I think this code does what you are looking for:

# - scaler   = the scaler object (it needs an inverse_transform method)
# - data     = the data to be inverse transformed as a Series, ndarray, ... 
#              (a 1d object you can assign to a df column)
# - ftName   = the name of the column to which the data belongs
# - colNames = all column names of the data on which scaler was fit 
#              (necessary because scaler will only accept a df of the same shape as the one it was fit on)
def invTransform(scaler, data, colName, colNames):
    dummy = pd.DataFrame(np.zeros((len(data), len(colNames))), columns=colNames)
    dummy[colName] = data
    dummy = pd.DataFrame(scaler.inverse_transform(dummy), columns=colNames)
    return dummy[colName].values

Note that you need to provide enough information to run use the inverse_transform method of the scaler object behind the scenes.

Answered By - Willem

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, May 21, 2022

[FIXED] Sklearn inverse_transform return only one column when fit to many

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels