Issue
I'm new to machine learning and wanted to understand how to evaluate the RMSE when the data is scaled. I used the California housing dataset and trained it with SVR:
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()
X = housing["data"]
y = housing["target"]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
I then scaled the data for the SVR and trained the model:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
from sklearn.svm import LinearSVR
lin_svr = LinearSVR(random_state=42)
lin_svr.fit(X_train_scaled, y_train)
When I wanted to evaluate the RMSE the result was scaled so it didn't make a lot of sense to me:
from sklearn.metrics import mean_squared_error
y_pred = lin_svr.predict(X_train_scaled)
rmse = np.sqrt(mean_squared_error(y_train, y_pred))
rmse was 0.976993881287582
How do I make sense of the result? (the y column is in tens of thousands of dollars)
I tried to y_pred
by unscaling the data but the result did not make sense:
y_pred = lin_svr.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
np.sqrt(mse)
So the question is, how do I interpret the RMSE when the data is scaled and is there a correct way to unscale it in order to make sense of it
Thanks!
Solution
Here you don't scale the target variable, so the unit of the rmse is just the same as the target variable. Because the target variable is in units of 100,000 dollars, rmse a measuring to define the difference between observed and predicted data. That means rmse = 0.976993881287582 => 97,699 dollars.
Answered By - jhihan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.