Issue
Having the following DF:
A B c
0 1 1 21
1 2 12 122
2 3 3 23
3 4 14 124
4 5 5 25
My "train set" has been transformed using StandardScaler. The transformed features have the following values:
array([[-1.41421356, -1.17669681, -0.85696442],
[-0.70710678, 0.98058068, 1.20383097],
[ 0. , -0.78446454, -0.81615659],
[ 0.70710678, 1.37281295, 1.24463879],
[ 1.41421356, -0.39223227, -0.77534876]])
Scaler is saved as a PKL file, and it is used at inference time to transform features. However, on some occasions, I only want certain features, for example only the A
column. As the scaler was trained on a different shape, it is unable to transform just the A
, and thus an error arises.
ValueError: X has 1 features, but StandardScaler is expecting 3 features as input.
As a result, I'm transforming the data and selecting the feature afterward, as follows:
from sklearn.preprocessing import StandardScaler
import pandas as pd
df = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,12,3,14,5],'c':[21,122,23,124,25]})
scaler = StandardScaler()
scaler.fit_transform(df.values)
# scaler.transform(df[['A']].values) #this line will fail with ValueError: X has 1 features, but StandardScaler is expecting 3 features as input.
scaler.transform(df.values).T[0]
Is there a more elegant way to do so?
Update
In some cases I don't even have the entire DF but just the raw column, so I can't use the scaler.
Solution
Looking at the scaler API and the code there seems to be no way of applying on a column subsample with the sklearn class. You could write your own class taking an optional column mask at transform time and applying it before the scaling. For instance
class PartialStandardScaler(StandardScaler):
def transform(self, X, column_mask=None):
if column_mask is None:
return super().transform(X)
return (X[:,column_mask] - self.mean_[column_mask])/self.scale_[column_mask]
and in your case you could have
scaler.transform(df.values, column_mask=[True,False,False])
You could allow the column mask to be passed as a list of column indices too.
Answered By - Learning is a mess
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.