Issue
Is there something similar in R
that allows to fit a StandardScaler
(resulting into mean=0 and standard deviation=1 features) to the training data and use that scaler model to transform the test data? scale
does not offer a way to transform test-data based on the mean and standard deviation from the training data.
Snippet for Python
:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)
Since I'm pretty sure that this is the right way to do so (avoiding the leak of information from the test to the training set) I guess there is a simple solution I'm just unable to find.
Solution
I believe that the scale
function in R
does what you are looking for. For your example, that would just be
X_train_scaled = scale(X_train)
Then, you can apply the mean and sd from the scaled training set to your test set using the attr
(attributes) from your scaled X_train:
X_test_scaled = scale(X_test, center=attr(X_train_scaled, "scaled:center"),
scale=attr(X_train_scaled, "scaled:scale"))
This obtains the exact results as the transformations from the example that you posted
Answered By - sacuL
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.