Issue
I have a python script that generates predictions using sklearn Random Forest and fixed random_state = 0. It produces always deterministic results on the one computer (system) but when I switch to another computer, results are different.
Is there a way to make it deterministic across different systems? How to make identical results on a different machine like on the first machine?
The script is complicated and long so I won't share the code but I think the problem is in Random Forest random_state because when I tried using KNN instead of RF, results were identical
Solution
sklearn.neighbors.KNeighborsClassifier
uses all observations from your train data, while as the name suggests sklearn.ensemble.RandomForestClassifier
uses data randomly, so you can expect different results from Random Forest per iteration. Now coming to the question of using it on different systems, this is tricky one, but you can give a try to following approach (though I have not tested this yet).
1). Fit a Random Forest model on your data with some random_state
, let's say random_state = 0
2). Import pickle
, create a pickle object rf.pkl
which will be saved at your current working directory.
3). Dump the current Random Forest model object in the pickle object.
import pickle
pkl = 'rf.pkl'
with open(pkl,'wb') as file:
pickle.dump(rf,file)
4). Share the pickle object file to another user/system.
5). Store the pickle object at some location and set that as working directory.
6). Open Python on that system, run your python code to read the data.
7). Instead of creating a new model, load the pickled model using following lines of code:
with open(pkl,'rb') as file:
pkl_model = pickle.load(file)
8). Test if your pickled model works and produces same results as it did on another system.
I haven't tested this approach, but I think you should give a try to this and let me know if this works. Cheers!!
Answered By - ManojK
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.