Issue
When trying machine learning on a dataset, I got the same results for metrics such as accuracy and F-score on different machine learning algorithms.
I have a dataset on which I trained my chosen algorithms. I found it on the Kaggle website: source.
Here are code snippets from the Jupiter file, and the results of their execution:
List of connected libraries
IN:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from nltk.corpus import stopwords
from sklearn.metrics import accuracy_score, f1_score
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report
import joblib
import tensorflow as tf
import numpy as np
from tensorflow.keras import models, layers
import warnings
warnings.filterwarnings('ignore')
Loading dataset
IN:
df = pd.read_csv("payload_mini.csv",encoding='utf-16')
df.head(10)
Load, process and split the data for further training of the classification model
IN:
df = pd.read_csv("payload_mini.csv",encoding='utf-16')
df = df[(df['attack_type'] == 'sqli') | (df['attack_type'] == 'norm')]
X = df['payload']
y = df['label']
vectorizer = CountVectorizer(min_df = 2, max_df = 0.8, stop_words = stopwords.words('english'))
X = vectorizer.fit_transform(X.values.astype('U')).toarray()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)
OUT:
(8040, 1585)
(8040,)
(2011, 1585)
(2011,)
Naive Bayes Classifier
IN:
nb_clf = GaussianNB()
nb_clf.fit(X_train, y_train)
y_pred = nb_clf.predict(X_test)
print(f"Accuracy of Naive Bayes on test set : {accuracy_score(y_pred, y_test)}")
print(f"F1 Score of Naive Bayes on test set : {f1_score(y_pred, y_test, pos_label='anom')}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
OUT:
Accuracy of Naive Bayes on test set : 0.9806066633515664
F1 Score of Naive Bayes on test set : 0.9735234215885948
Classification Report:
precision recall f1-score support
anom 0.97 0.98 0.97 732
norm 0.99 0.98 0.98 1279
accuracy 0.98 2011
macro avg 0.98 0.98 0.98 2011
weighted avg 0.98 0.98 0.98 2011
Random forest algorithm:
IN:
rf_clf = RandomForestClassifier()
rf_clf.fit(X_train, y_train)
y_pred_rf = rf_clf.predict(X_test)
print(f"Accuracy of Random Forest on test set : {accuracy_score(y_pred, y_test)}")
print(f"F1 Score of Random Forest on test set : {f1_score(y_pred, y_test, pos_label='anom')}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred_rf))
OUT:
Accuracy of Random Forest on test set : 0.9806066633515664
F1 Score of Random Forest on test set : 0.9735234215885948
Classification Report:
precision recall f1-score support
anom 1.00 0.96 0.98 732
norm 0.98 1.00 0.99 1279
accuracy 0.99 2011
macro avg 0.99 0.98 0.99 2011
weighted avg 0.99 0.99 0.99 2011
Support vector machine
IN:
svm_clf = SVC(gamma = 'auto')
svm_clf.fit(X_train, y_train)
y_pred = svm_clf.predict(X_test)
print(f"Accuracy of SVM on test set : {accuracy_score(y_pred, y_test)}")
print(f"F1 Score of SVM on test set: {f1_score(y_pred, y_test, pos_label='anom')}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
OUT:
Accuracy of SVM on test set : 0.9189457981103928
F1 Score of SVM on test set: 0.8658436213991769
Classification Report:
precision recall f1-score support
anom 1.00 0.76 0.87 689
norm 0.89 1.00 0.94 1322
accuracy 0.92 2011
macro avg 0.95 0.88 0.90 2011
weighted avg 0.93 0.92 0.92 2011
As you can see when training on different machine learning algorithms, we get the same results in the case of random forest and naive Bayesian classifier. I hope you can help me to fix a possible bug in the code or improve it in some way.
Solution
In your code for Random Forest, you're storing predictions as y_pred_rf
but calling your metrics on y_pred
Answered By - astroChance
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.