Issue
I have read here https://towardsdatascience.com/do-decision-trees-need-feature-scaling-97809eaa60c6 and watch https://www.youtube.com/watch?v=nmBqnKSSKfM&ab_channel=KrishNaik video which stated that you don't need to use Standard Scaler for Decision Tree machine learning.
But, what happened is on my code is the opposite. Heres the code I am trying to run.
# importing libraries
import numpy as nm
import matplotlib.pyplot as mpl
import pandas as pd
#importing datasets
data_set= pd.read_csv('Social_Network_Ads.csv')
#Extracting Independent and dependent Variable
x= data_set.iloc[:, [2,3]].values
y= data_set.iloc[:, 4].values
# Splitting the dataset into training and test set.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)
#feature Scaling
from sklearn.preprocessing import StandardScaler
st_x= StandardScaler()
x_train= st_x.fit_transform(x_train)
x_test= st_x.transform(x_test)
#Fitting Decision Tree classifier to the training set
from sklearn.tree import DecisionTreeClassifier
classifier= DecisionTreeClassifier(criterion='entropy', random_state=0)
classifier.fit(x_train, y_train)
I continue the question on the part which I try to visualize the data. Here's the code.
#Visulaizing the trianing set result
from matplotlib.colors import ListedColormap
x_set,y_set = x_train, y_train
x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01),
nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
mpl.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape),
alpha = 0.75, cmap = ListedColormap(('purple','green' )))
mpl.xlim(x1.min(), x1.max())
mpl.ylim(x2.min(), x2.max())
for i, j in enumerate(nm.unique(y_set)):
mpl.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], c = ListedColormap(('purple', 'green'))(i), label = j)
mpl.title('Decision Tree Algorithm (Training set)')
mpl.xlabel('Age')
mpl.ylabel('Estimated Salary')
mpl.legend()
mpl.show()
The output is succeed if I ran it with the StandardScaler. The graph is showed nicely. But, as I hashed (comment) the StandardScaler part, it stated the Memory Error.
MemoryError Traceback (most recent call last)
<ipython-input-8-1282bf709e27> in <module>
3 x_set,y_set = x_train, y_train
4 x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01),
----> 5 nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
6 mpl.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape),
7 alpha = 0.75, cmap = ListedColormap(('purple','green' )))
~\Anaconda3\lib\site-packages\numpy\lib\function_base.py in meshgrid(*xi, **kwargs)
4209
4210 if copy_:
-> 4211 output = [x.copy() for x in output]
4212
4213 return output
~\Anaconda3\lib\site-packages\numpy\lib\function_base.py in <listcomp>(.0)
4209
4210 if copy_:
-> 4211 output = [x.copy() for x in output]
4212
4213 return output
MemoryError:
The error only occurs on the visualizing part; in the other part of the code such prediction works nicely without the Standard Scaler.
Can the Decision Tree work without Standard Scaler? If yes, how can I fix this?
Solution
Decision Tree can work without Standard Scaler and with Standard Scaler. The important thing to note here is that scaling the data won't affect the performance of a Decision Tree model.
If you are plotting the data afterwards though I imagine you don't want to plot the scaled data but rather the original data; hence your problem.
The simplest solution I can think of for doing this is to pass sparse=True
as an argument to numpy.meshgrid
as that seems to be what's throwing the error in your traceback. There's some detail on that in a past question here.
So applied to your question, that would mean you change this line:
nm.meshgrid(
nm.arange(start=x_set[:, 0].min() - 1, stop=x_set[:, 0].max() + 1, step=0.01),
nm.arange(start=x_set[:, 1].min() - 1, stop=x_set[:, 1].max() + 1, step=0.01),
)
to
nm.meshgrid(
nm.arange(start=x_set[:, 0].min() - 1, stop=x_set[:, 0].max() + 1, step=0.01),
nm.arange(start=x_set[:, 1].min() - 1, stop=x_set[:, 1].max() + 1, step=0.01),
sparse=True,
)
Answered By - osint_alex
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.