Issue
Let's suppose I have a train.py file containing the logic for training a model and then saving its parameters into a directory called weights/
:
x_train, x_test, y_train, y_test = train_test_split(x, y)
model = compile()
model.fit(x_train, y_train)
model.save_weights("weights/")
Another file, namely evaluate.py, contains the logic for evaluating the performance of the model whose parameters will be loaded from the weights/
directory:
x_train, x_test, y_train, y_test = train_test_split(x, y)
model = compile()
model.load_weights("weights/")
model.evaluate(x_test, y_test)
My question is: in the evaluate.py file, is the statement x_train, x_test, y_train, y_test = train_test_split(x, y)
correct or am I supposed to load the same test set splitted in the train.py file? In that case the train.py file would be:
x_train, x_test, y_train, y_test = train_test_split(x, y)
np.save("x_test", x_test)
np.save("y_test", y_test)
model = compile()
model.fit(x_train, y_train)
model.save_weights("weights/")
while the evaluate.py file would be:
x_test = np.load("x_test")
y_test = np.load("y_test")
model = compile()
model.load_weights("weights/")
model.evaluate(x_test, y_test)
Solution
I think the simple way to deal with the evaluation model is to have train and test split data, on train dataset the model learn weights, then on evaluation stage you check model's metric on test data. You don't need to split data again in evaluate.py. I also recommend specifying the random_state when splitting the dataset to achieve a reproducible result.
train_test_split(X, y, random_state=42)
Answered By - Anna Andreeva Rogotulka
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.