Issue
I am using Sklearn to train a MultiLayer Perceptron Regression on 12 features and one output. The StandardScalar() is fit to the training data and applied to all input data. After a training period with architectural optimization, I get a model that is seemingly quite accurate (<10% error). I now need to extract the weights and biases in order to implement the prediction in real time on a system that interacts with a person. This is being done with my_model.coefs_ for weights and my_model.intercepts_ for the biases. The weights are appropriately shaped for the number of nodes in my model and the biases have the appropriate lengths for each layer.
The problem is now that I implement the matrix algebra in MatLab and get wildly different predictions from what my_model.predict() yields.
My reconstruction process for a 2 layer MLP (with 11 nodes in the first layer and 10 nodes in the second):
scale() % elementwise subtract feature mean and divide by feature stdev
scaled_obs = scale(raw_obs)
% Up to this point results from MatLab == Sklearn
weight1 = [12x11] % weights to transition from the input layer to the first hidden layer
weight2 = [11x10]
weight3 = [10x1]
bias1 = [11x1] % bias to add to the first layer after weight1 has been applied
bias2 = [10x1]
bias3 = [1x1]
my_prediction = ((( scaled_obs * w1 + b1') * w2 + b2') * w3 + b3);
I also tried
my_prediction2 = ((( scaled_obs * w1 .* b1') * w2 .* b2') * w3 .* b3); % because nothing worked...```
for my specific data:
Sklearn prediction = 1.731
my_prediction = -50.347
my_prediction2 = -3.2075
Is there another weight/bias that I am skipping when extracting relevant params from my_model? Is my order of operations in the reconstruction flawed?
Solution
In my opinion my_prediction = ((( scaled_obs * w1 + b1') * w2 + b2') * w3 + b3);
is correct, but there is only 1 missing part and that is activation function. What was the activation function you had passed for the model. By default MLPRegressor
have relu
as activation function from first layer to third last layer(inclusive). Second last layer doesn't have any activation function. And output layer have a separate activation function which is identity
function, basically f(x) = x
so you don't have to do anything for that.
If you selected relu
or if You didn't at all selected an activation (then relu
is default), then you have to do something like this in numpy as np.maximum(0, your_layer1_calculation)
, I am not sure how this is done in matlab
So final formula would be :
layer1 = np.dot(scaled_inputs, weight0) + bias0
layer2 = np.dot(np.maximum(0, layer1), weight1) + bias1
layer......
layer(n-1) = np.dot(np.maximum(0, layer(n-2), weight(n-1)) + bias(n-1)
layer(n) = layer(n-1) # identity function
Answered By - Yash Patel
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.