Issue
I've created a model is trained on the titanic dataset, and I want to see an accuracy percentage for my model. I've done this before, but sadly, I do not remember. I looked at the internet, and I couldn't find anything. Either I just entered the wrong words, or there isn't anything their.
# the tts function is `train_test_split` from `sklearn.model_selection`
train_X, val_X, train_y, val_y = tts(X, y, random_state = 0) # y is the state of survival
forest_model = RandomForestRegressor(random_state=0)
forest_model.fit(train_X, train_y)
val_predictions = forest_model.predict(val_X)
How can I calculate the accuracy?
Solution
I wonder why are you using RandomForestRegressor
, as titanic dataset can be formulated as a binary-classification problem. Assuming it is a mistake, to measure accuracy you can of a RandomForestClassifier
, you can do:
>>> from sklearn.metrics import accuracy_score
>>> accuracy_score(val_y, val_predictions)
However, it is often better to use K-fold cross-validation
, which gives you more reliable accuracy:
>>> from sklearn.model_selection import cross_val_score
>>> cross_val_score(forest_model, X, y, cv=10, scoring='accuracy') #10-fold cross validation, notice that I am giving X,y not X_train, y_train
K-fold cross-validation
gives you 10 accuracy values of accuracy, as it divides the data into 10 folds (i.e. parts). Then, you can get a mean
and standard deviation
the accuracy values as follows:
>>> import numpy as np
>>> accuracy = np.array(cross_val_score(forest_model, X, y, cv=10, scoring='accuracy'))
>>> accuracy.mean() #mean of accuracies
0.81
>>> accuracy.std() #standard deviation of accuracies
0.04
You can also use other scoring metric such as F1-score
, Precision
,
Recall
,cohen_kappa_score
etc.
Answered By - Grayrigel
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.