Issue
Dataset: https://docs.google.com/spreadsheets/d/1OBdyMv8yU7EEdlUNqk_Ox9gT2LMItY2DivEiVX4fYWY/edit?usp=sharing
So I am trying to apply machine learning to the stats in the dataset, however every time I try to encode/pre-process the data I am left with receiving:
TypeError: Index does not support mutable operations
Isn't the point of preprocessing to change the values? & isn't that a necessary precursor to apply machine learning? Don't know how to go about encoding/preprocessing... any suggestions are appreciated. Thanks !
Code:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import datetime as dt
from sklearn import datasets, metrics
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.preprocessing import OrdinalEncoder
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
dbdata = pd.read_excel("C:/Users/Andrew/sportsref_download.xlsx")
print(dbdata)
print(dbdata.describe())
df = dbdata.columns
print(df)
#define x&y
x = dbdata
y = dbdata.PTS
shapes = x.shape, y.shape
print(shapes)
print(dbdata.index)
print('next')
#apply logreg
logreg = LogisticRegression(solver='lbfgs')
cross_val_score(logreg, x, y, cv=2, scoring='accuracy').mean()
print(cross_val_score)
le = LabelEncoder()
df["date_tf"] = le.fit_transform(dbdata.Date)
df["tm_tf"] = le.fit_transform(df.Tm)
df["opp_tf"] = le.fit_transform(df.Opp)
OneHotEncoder().fit_transform(df[['date_tf']]).toarray()
Cols = ["Date","Tm","Opp"]
integer_encoded = OrdinalEncoder().fit_transform(x[Cols])
scaler = StandardScaler()
X_scaled = scaler.fit_transform(x)
print(X_scaled)
ec = OneHotEncoder()
X_encoded = dbdata.apply(lambda col: ec.fit_transform(col.astype(str)), axis=0, result_type='expand')
X_encoded = ec.fit_transform(x.values.reshape(-1,1), y)
print(X_encoded)
X_encoded = ec.fit_transform(x)
Solution
The error is because the model has been fit before the non-numeric values have been encoded.
You are feeding the model data in date format, which sadly will not work.
ML models of this nature only take numeric or binary data.
Sklearn documentation
Data pre-processing is one of the most important parts of building and deploying a model into production.
If you need help cleaning the data, just drop a message. I would be happy to help.
Otherwise, refer here: Intro article on data cleaning
Answered By - AnalyticSolutions
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.