I have read here and watch video which stated that you don't need to use Standard Scaler for Decision Tree machine learning.
But, what happened is on my code is the opposite. Heres the code I am trying to run.
# importing libraries
import numpy as nm
import matplotlib.pyplot as mpl
import pandas as pd
#importing datasets
data_set= pd.read_csv('Social_Network_Ads.csv')
#Extracting Independent and dependent Variable
x= data_set.iloc[:, [2,3]].values
y= data_set.iloc[:, 4].values
# Splitting the dataset into training and test set.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)
#feature Scaling
from sklearn.preprocessing import StandardScaler
st_x= StandardScaler()
x_train= st_x.fit_transform(x_train)
x_test= st_x.transform(x_test)
#Fitting Decision Tree classifier to the training set
from sklearn.tree import DecisionTreeClassifier
classifier= DecisionTreeClassifier(criterion='entropy', random_state=0), y_train)
I continue the question on the part which I try to visualize the data. Here's the code.
#Visulaizing the trianing set result
from matplotlib.colors import ListedColormap
x_set,y_set = x_train, y_train
x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01),
nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
mpl.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape),
alpha = 0.75, cmap = ListedColormap(('purple','green' )))
mpl.xlim(x1.min(), x1.max())
mpl.ylim(x2.min(), x2.max())
for i, j in enumerate(nm.unique(y_set)):
mpl.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], c = ListedColormap(('purple', 'green'))(i), label = j)
mpl.title('Decision Tree Algorithm (Training set)')
mpl.ylabel('Estimated Salary')
The output is succeed if I ran it with the StandardScaler. The graph is showed nicely. But, as I hashed (comment) the StandardScaler part, it stated the Memory Error.
MemoryError Traceback (most recent call last)
<ipython-input-8-1282bf709e27> in <module>
3 x_set,y_set = x_train, y_train
4 x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01),
----> 5 nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
6 mpl.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape),
7 alpha = 0.75, cmap = ListedColormap(('purple','green' )))
~\Anaconda3\lib\site-packages\numpy\lib\ in meshgrid(*xi, **kwargs)
4210 if copy_:
-> 4211 output = [x.copy() for x in output]
4213 return output
~\Anaconda3\lib\site-packages\numpy\lib\ in <listcomp>(.0)
4210 if copy_:
-> 4211 output = [x.copy() for x in output]
4213 return output
The error only occurs on the visualizing part; in the other part of the code such prediction works nicely without the Standard Scaler.
Can the Decision Tree work without Standard Scaler? If yes, how can I fix this?
Decision Tree can work without Standard Scaler and with Standard Scaler. The important thing to note here is that scaling the data won't affect the performance of a Decision Tree model.
If you are plotting the data afterwards though I imagine you don't want to plot the scaled data but rather the original data; hence your problem.
The simplest solution I can think of for doing this is to pass sparse=True
as an argument to numpy.meshgrid
as that seems to be what's throwing the error in your traceback. There's some detail on that in a past question here.
So applied to your question, that would mean you change this line:
nm.arange(start=x_set[:, 0].min() - 1, stop=x_set[:, 0].max() + 1, step=0.01),
nm.arange(start=x_set[:, 1].min() - 1, stop=x_set[:, 1].max() + 1, step=0.01),
nm.arange(start=x_set[:, 0].min() - 1, stop=x_set[:, 0].max() + 1, step=0.01),
nm.arange(start=x_set[:, 1].min() - 1, stop=x_set[:, 1].max() + 1, step=0.01),
Answered By - osint_alex
Post a Comment
Note: Only a member of this blog may post a comment.