Issue
Assume that I have a train dataset. I split it into train / test. For training, I use standard scaler to fit.transform on train data and transform on test data. Then, I train a model and save it.
train.py:
data = pd.read_csv("train.csv")
X = data["X"]
y = data["y"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
scale = StandardScaler()
X_train_s = scale.fit_transform(X_train)
X_test_s = scale.transform(X_test)
model.fit(X_train_s, y_train)
y_pred = model.predcit(X_test_s)
# save model
joblib.dump(model, filename)
Now, I load the model in another script, and I have another dataset only for prediction. Question is how to scale prediction dataset when I don't have train dataset. Is it correct to fit.transform on prediction dataset as below?
prediction.py
data = pd.read_csv("prediction.csv")
X = data["X"]
y = data["y"]
scale = StandardScaler()
X_predict_s = scale.fit_transform(X)
loaded_model = joblib.load(filename)
y_pred = loaded_model(X_predict_s)
Or I have to load train data into prediction.py
and use it to fit.transform scaler.
Solution
I like using pickle
, but the same logic applies to joblib
.
In essence, you have to dump your scaler and load it in the new script, just like you did with model
and loaded_model
.
In the script where you trained the model:
from pickle import dump
# save model
dump(model, open('model.pkl', 'wb'))
# save scaler
dump(scale, open('scale.pkl', 'wb'))
In the script where you load the model:
from pickle import load
# load model
loaded_model = load(model, open('model.pkl', 'rb'))
# load scaler
loaded_scale = load(scale, open('scale.pkl', 'rb'))
Now you have to transform your data using loaded_scale
and predict on the scaled data using loaded_model
.
Answered By - Arturo Sbr
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.