Issue
So what I've understand is that StandardScaler().fit_transform(X, y)
does not change the target feature (y
). Meanwhile, for some algorithms (such as weight-based or distance-based) we also need to scale the target feature.
My question is, do we have to implement two StandardScaler
, one for the features and another for the target feature? I imagine we can also use it before splitting the training dataset into X
and y
, but wonder how we might then use it on deployment, as we wouldn't have y
.
# --- creating pipelines
transformer_x = make_pipeline(
SimpleImputer(strategy='constant'),
StandardScaler())
transformer_y = make_pipeline(
SimpleImputer(strategy='constant'),
StandardScaler())
# --- development
model.fit(transformer_x.fit_transform(X_train), transformer_y.fit_transform(y_train))
# ---
# sometime later in deployment
saved_model.predict(transformer_x.transform(new_data))
Also as a side question, is there any condition where we might not need to do standardisation for weight/distance-based algorithms?
Thanks!
Solution
- Do we have to implement two
StandardScaler
, one for the features and another for the target feature?
In general it's not necessary to scale target features. The only case it may be beneficial is in some cases of NN. Check this link for further information related.
- Also as a side question, is there any condition where we might not need to do standardization for weight/distance-based algorithms?
Standardization always is beneficial for your training phase in general terms. Maybe you should avoid it if you need some type of interpretation of the parameter. Again, here I provided an interesting link.
Answered By - Alex Serra Marrugat
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.