Issue
Is there a way to inverse_transform one column with sklearn, when the initial transformer was fit on the whole data set? Below is an example of what I am trying to get after.
import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
# Setting up a dummy pipeline
pipes = []
pipes.append(('scaler', MinMaxScaler()))
transformation_pipeline = Pipeline(pipes)
# Random data.
df = pd.DataFrame(
{'data1': [1, 2, 3, 1, 2, 3],
'data2': [1, 1, 1, 2, 2, 2],
'Y': [1, 4, 1, 2, 2, 2]
}
)
# Fitting the transformation pipeline
test = transformation_pipeline.fit_transform(df)
# Pulling the scaler function from the pipeline.
scaler = transformation_pipeline.named_steps['scaler']
# This is what I thought may work.
predicted_transformed = scaler.inverse_transform(test['Y'])
# The output would look something like this
# Essentially overlooking that scaler was fit on 3 variables and fitting
# the last one, or any I need.
predicted_transfromed = [1, 4, 1, 2, 2, 2]
I need to be able to fit the whole dataset as part of a data prep process. But then I am importing the scaler later into another instance with sklearn.externals joblibs. In this new instance the predicted values are the only thing that exists. So I need to extract just the inverse scaler for the Y column to get back the originals.
I am aware that I could fit one transformer for X variables and Y variables, However, I would like to avoid this. This method would add to the complexity of moving the scalers around and maintaining both of them in future projects.
Solution
A bit late but I think this code does what you are looking for:
# - scaler = the scaler object (it needs an inverse_transform method)
# - data = the data to be inverse transformed as a Series, ndarray, ...
# (a 1d object you can assign to a df column)
# - ftName = the name of the column to which the data belongs
# - colNames = all column names of the data on which scaler was fit
# (necessary because scaler will only accept a df of the same shape as the one it was fit on)
def invTransform(scaler, data, colName, colNames):
dummy = pd.DataFrame(np.zeros((len(data), len(colNames))), columns=colNames)
dummy[colName] = data
dummy = pd.DataFrame(scaler.inverse_transform(dummy), columns=colNames)
return dummy[colName].values
Note that you need to provide enough information to run use the inverse_transform
method of the scaler
object behind the scenes.
Answered By - Willem
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.