Issue
I'm new in data science, I have a question about train_test_split
.
I have a example try to predict ice tea sales from temperature
My Question is when I use train_test_split
, my mse, score & predict sales value will be different every times (since train_test_split
selected different part every times)
Is this normal? If user enter 30 degree same value every time and they will get different predict sales value?
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
#1. predict value
temperature = np.reshape(np.array([30]), (1, 1))
#2. data
X = np.array([29, 28, 34, 31, 25, 29, 32, 31, 24, 33, 25, 31, 26, 30]) #temperatures
y = np.array([77, 62, 93, 84, 59, 64, 80, 75, 58, 91, 51, 73, 65, 84]) #iced_tea_sales
X = np.reshape(X, (len(X), 1))
y = np.reshape(y, (len(y), 1))
#3. split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
#4. train
lm = LinearRegression()
lm.fit(X_train, y_train)
#5. mse score
y_pred = lm.predict(X_test)
mse = np.mean((y_pred - y_test) ** 2)
r_squared = lm.score(X_test, y_test)
print(f'mse: {mse}')
print(f'score(r_squared): {r_squared}')
#6. predict
sales = lm.predict(temperature)
print(sales) #output, user get their prediction
Solution
The values will never be the same as when you fit() any model even on the same data multiple times, the weights learned may vary hence the predictions can never be the same. Though, they should be close enough (if you don't have outliers) as the distribution from which the samples are coming is common.
Answered By - Mehul Gupta
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.