Saturday, December 16, 2023

[FIXED] A problem with the user input during the random forest classifier demonstration

December 16, 2023 machine-learning, python, scikit-learn, user-input No comments

Issue

I got over 90% accuracy with the Random Forest classifier, but I worry the rest of the algorithms give much lower results: A table with the results But this is not the main concern. The problem is that when I used user inputs, the prediction was 100 percent wrong. The order of the columns of the user input corresponds to the training data set columns' places.

model = RandomForestClassifier()
model.fit(X_train, y_train)
prediction = model.predict(X_test)
acc = accuracy_score(y_test, prediction)   # output: 0.91

X_test_user = df_user_compounds_1.to_numpy()
user_input_predictions_1 = model.predict(X_test_user) # 
user_input_predictions_1    # output: array([0, 0, 0, 0, 0], dtype=int64), but it should be: array([1, 1, 1, 1, 1],dtype=int64)

Does anyone have any idea why this is happening?

The dataset is preprocessed - no missing values, no duplicates, it was balanced with RandomOverSampler, scaled with MinMaxScaler, no negative values and contains 11 features/7K rows.

...........

Thank you so much @ElvinJafarov. These are parts from df_user_compounds_1, and X_test after your suggestion.

Since I had already used MinMaxScaler(), I had to add two more rows to df_user_compounds_1, containing the corresponding min and max values to simulate identical scaling to the original one. I found the max and min values through df.describe(include="all"), concatenated these two rows to the user inputs data frame and scaled

I am happy with the result: first 5 must be 1, i.e. 4 out of 5

Solution

First of all, it is okay that different algorithms give different accuracy rate.

Secondly, in your case, there might be several reasons.

You have scaled your inputs in data but not in df_user_compounds_1
your model might be overfitted
dataset was preprocessed differently than df_user_compounds_1

Thirdly, this is not how you approach to choose a model. You have to try K-Fold Cross validationn , hyperparameter tuning

Answered By - Elvin Jafarov

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, December 16, 2023

[FIXED] A problem with the user input during the random forest classifier demonstration

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels