Thursday, January 20, 2022

[FIXED] Encoding dataset index presents a type-error, why? [Machine Learning]

January 20, 2022 encoding, machine-learning, pandas, python, scikit-learn No comments

Issue

Dataset: https://docs.google.com/spreadsheets/d/1OBdyMv8yU7EEdlUNqk_Ox9gT2LMItY2DivEiVX4fYWY/edit?usp=sharing

So I am trying to apply machine learning to the stats in the dataset, however every time I try to encode/pre-process the data I am left with receiving:

TypeError: Index does not support mutable operations

Isn't the point of preprocessing to change the values? & isn't that a necessary precursor to apply machine learning? Don't know how to go about encoding/preprocessing... any suggestions are appreciated. Thanks !

Code:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import datetime as dt

from sklearn import datasets, metrics
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.preprocessing import OrdinalEncoder
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score



dbdata = pd.read_excel("C:/Users/Andrew/sportsref_download.xlsx")

print(dbdata)
print(dbdata.describe())
df = dbdata.columns
print(df)

#define x&y
x = dbdata
y = dbdata.PTS

shapes = x.shape, y.shape
print(shapes)

print(dbdata.index)
print('next')

#apply logreg
logreg = LogisticRegression(solver='lbfgs')

cross_val_score(logreg, x, y, cv=2, scoring='accuracy').mean()
print(cross_val_score)

le = LabelEncoder()
df["date_tf"] = le.fit_transform(dbdata.Date)
df["tm_tf"] = le.fit_transform(df.Tm)
df["opp_tf"] = le.fit_transform(df.Opp)

OneHotEncoder().fit_transform(df[['date_tf']]).toarray()

Cols = ["Date","Tm","Opp"]
integer_encoded = OrdinalEncoder().fit_transform(x[Cols])



scaler = StandardScaler()
X_scaled = scaler.fit_transform(x)
print(X_scaled)

ec = OneHotEncoder()
X_encoded = dbdata.apply(lambda col: ec.fit_transform(col.astype(str)), axis=0, result_type='expand')
X_encoded = ec.fit_transform(x.values.reshape(-1,1), y)


print(X_encoded)
X_encoded = ec.fit_transform(x)

Solution

The error is because the model has been fit before the non-numeric values have been encoded.
You are feeding the model data in date format, which sadly will not work. ML models of this nature only take numeric or binary data.
Sklearn documentation

Data pre-processing is one of the most important parts of building and deploying a model into production.
If you need help cleaning the data, just drop a message. I would be happy to help.
Otherwise, refer here: Intro article on data cleaning

Answered By - AnalyticSolutions

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, January 20, 2022

[FIXED] Encoding dataset index presents a type-error, why? [Machine Learning]

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels