Issue
I made a very simple program, that takes columns of data from a csv file, here is a short preview of the file data:
,matchId,blue_win,blueGold,blueMinionsKilled,blueJungleMinionsKilled,blueAvgLevel,redGold,redMinionsKilled,redJungleMinionsKilled,redAvgLevel,blueChampKills,blueHeraldKills,blueDragonKills,blueTowersDestroyed,redChampKills,redHeraldKills,redDragonKills,redTowersDestroyed
0,3493250918.0,0,24575.0,349.0,89.0,8.6,25856.0,346.0,80.0,9.2,6.0,1.0,0.0,1.0,12.0,2.0,0.0,1.0
1,3464936341.0,0,27210.0,290.0,36.0,9.0,28765.0,294.0,92.0,9.4,20.0,0.0,0.0,0.0,19.0,2.0,0.0,0.0
2,3428425921.0,1,32048.0,346.0,92.0,9.4,25305.0,293.0,84.0,9.4,17.0,3.0,0.0,0.0,11.0,0.0,0.0,4.0
3,3428347390.0,0,20261.0,223.0,60.0,8.2,30429.0,356.0,107.0,9.4,7.0,0.0,0.0,3.0,16.0,3.0,0.0,0.0
4,3428350940.0,1,30217.0,376.0,110.0,9.8,23889.0,334.0,60.0,8.8,16.0,3.0,0.0,0.0,8.0,0.0,0.0,2.0
5,3494458885.0,1,25470.0,362.0,82.0,9.2,22856.0,319.0,86.0,8.8,9.0,1.0,0.0,0.0,7.0,1.0,0.0,0.0
6,3463320642.0,1,25391.0,350.0,96.0,9.2,23236.0,345.0,80.0,8.6,8.0,2.0,0.0,0.0,5.0,1.0,0.0,1.0
...
I drop the unnecessary columns and run tests with 30% data used as test data to predict the accuracy of blue team winning the game:
import pandas as pd
import numpy as np
import sklearn
from sklearn import linear_model
df = pd.read_csv('MatchTimelinesFirst15.csv', delimiter=',')
predict = "blue_win"
df = df.drop('Unnamed: 0', axis=1)
df = df.drop('redDragonKills', axis=1)
df = df.drop('blueDragonKills', axis=1)
# print(df.describe())
x = np.array(df.drop([predict], axis=1))
y = np.array(df[predict])
for _ in range(500):
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.30)
# print('{0}, {1}'.format(type(x_train), x_train))
linear = linear_model.LinearRegression()
# trains model
linear.fit(x_train, y_train)
acc = linear.score(x_test, y_test)
print('Accuracy: {0}'.format(acc))
But my accuracy wont increase even tho training it through a loop 500 times? I keep getting the same range of results:
Accuracy: 0.39030223064480596
Accuracy: 0.3980014684661366
Accuracy: 0.3840247556358104
Accuracy: 0.3939949181269252
Accuracy: 0.38657487661026535
Accuracy: 0.3950506154649621
Accuracy: 0.3925506648304995
...
Any help will be greatly appreciated, also on improvements since i am very new to python and machine learning.
Solution
You are not training the model any further by using your loop. You start fresh every 500 times, only difference is the random initialisation of you train-test split.
As for improvements of your classifier, I would steer away from Linear Regression. Regression is not the same thing as classification. Classification will predict categorical class labels and regression predicts a continuous quantity.
Since you want to find out when the blue team wins, you have a binary classification problem. Either the blue team wins or it doesn't.
Try classification models like an SVM.
Good luck!
Answered By - Ethan Van den Bleeken
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.