Issue
I have the following code, in which I need to predict 3 different outputs and then calculate the MAE (mean absolute error) for each output. Since the Support Vector Machine Regression does not support multioutput regression by itself like other models do, like Random Forest and Linear regression. I found an option to do this with a MultiOutputRegressor class and considering this as a separate model for each output.
I have the following code where x are my features for both training and testing and y are my targets.
1) First I wanted to show that effectively my targets (y) have 3 values
print(X.shape, X_test.shape,y.shape,y_test.shape)
(10845, 2116) (4648, 2116) (10845, 3) (4648, 3)
2) Then I have the following code to calculate the mean absolute error (MAE) as well as to train a model and evaluate it on the dataset:
# Function to calculate mean absolute error
def mae(y_true, y_pred):
return np.mean(abs(y_true - y_pred))
# Funtion to take in a model, train it and evaluate it on the test set
def fit_and_evaluate2 (model):
# Train the model with training dataset for features (X) and target (y)
model.fit(X, y)
# Make predictions for the test dataset and evaluate the predictions vs the target in the test dataset
model_pred = model.predict(X_test)
model_mae = mae(y_test, model_pred)
# Return the performance metric
return model_mae
3) When I call this function for my Support Vector Machine Regression, the output given by model_pred
is in fact 3 values, but the MAE model_mae
is only 1 value:
svm = SVR(C = 1000, gamma = 0.1)
wrapper= MultiOutputRegressor(svm)
svm_mae = fit_and_evaluate2(wrapper)
print('Support Vector Machine Regression Performance on the test set is')
svm_mae
Support Vector Machine Regression Performance on the test set is
0.19932177495538966
I don´t understand why model_mae
shows only one value, since as shown above my target y
effectively has 3 values and the model_pred
also shows 3 values. Is there something I am doing wrong? I tried this with Random Forest and both predictions and MAE show 3 values.
Solution
The reason is the default axis=None
which is used in np.mean
when no axis
argument is specified; from the docs:
axis: None or int or tuple of ints, optional
Axis or axes along which the means are computed. The default is to compute the mean of the flattened array.
since it first flattens the array (i.e. no more 3 different outputs), and then it computes the MAE, which is now a single number.
You should change the definition of your mae
function to:
def mae(y_true, y_pred):
return np.mean(abs(y_true - y_pred), axis=0)
Let's confirm that it will work with some dummy data:
import numpy as np
# 2-output data
y_true = np.array([[0.5, 1], [-1, 1], [7, -6]])
y_pred = np.array([[0, 2], [-1, 2], [8, -5]])
mae(y_true, y_pred)
# array([0.5, 1. ])
i.e. a 2-valued MAE output, as required.
We can actually confirm this result using scikit-learn's mean_absolute_error
with the appropriate argument multioutput='raw_values'
(docs):
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_true, y_pred, multioutput='raw_values')
# array([0.5, 1. ])
Arguably, and since you are already using scikit-learn, you would be better utilizing the existing function for MAE instead of using your own.
Answered By - desertnaut
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.