Issue
Trying to build a sklearn DecisionTreeRegressor, I'm following the steps listed here to create a very simple decision tree.
X_train = np.array([[100],[500],[1500],[3500]])
y_train = np.array([23, 43, 44, 55])
# create a regressor object
regressor = DecisionTreeRegressor(random_state = 0)
# fit the regressor with X and Y data
regressor.fit(X_train, y_train)
The model works fine when predicting values that would be in the X_train interval:
y_pred = regressor.predict([[700]])
print(y_pred)
>[43.]
However, when predicting, for values higher than the interval listed in X_train, the model only predicts the max value of y_train.
X_test = np.array([[4000], [10000]])
y_pred = regressor.predict(X_test)
print(y_pred)
>[55. 55.]
How could the regression be extended using the X_test data to predict values higher than the ones listed in X_test, so that it predicts following the trend it finds for the X_train interval?
Solution
Classical decision tree algorithms can't really extrapolate beyond seen dataset and to understand why you can plot your decision tree and follow its decision path.
Imports
import numpy as np
from sklearn import tree
from matplotlib import pyplot as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression
Tree model
X_train = np.array([[100],[500],[1500],[3500]])
y_train = np.array([23, 43, 44, 55])
# create a regressor object
regressor = DecisionTreeRegressor(random_state = 0)
# fit the regressor with X and Y data
regressor.fit(X_train, y_train)
Visualized model
fig = plt.figure(figsize=(25,20))
_ = tree.plot_tree(regressor,
filled=True)
Linear model
X_train = np.array([[100],[500],[1500],[3500]])
y_train = np.array([23, 43, 44, 55])
reg = LinearRegression().fit(X_train, y_train)
x_outside_range = np.array([[4000], [10000]])
plt.plot(X_train,y_train, label='train data')
plt.plot(x_outside_range ,reg.predict(x_outside_range), label='prediction outside train data range')
plt.legend()
Answered By - Yev Guyduy
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.