Issue
I have written an MLP ANN code for a binary classification dataset and am getting 0.88
(88%) Accuracy for my training dataset. My Testing dataset gives me 0.37 - 0.55
Accuracy.
I noticed this was due to my parameters not being updated after the UpdateParameters method as shown below:
def update_parameters(parameters, grads, lr):
param1 = parameters
L = len(parameters) // 2
for l in range(L):
parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - lr * grads["dW"+str(l+1)]
parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - lr * grads["db"+str(l+1)]
print(param1==parameters)
return parameters
The above function gave me True
for all the initial and updated values comparison.
UpdateParameters function is called in the following function:
def ann(X, Y, dimensions, lr, lr_decay, batch_size, epochs, loss, activations, gradient_alg):
L = len(dimensions) # number of layers in the neural networks
m = X.shape[1]
costs = [] # to keep track of the cost
parameters = initialize_parameters(dimensions)
param1 = parameters
if (gradient_alg == "b"):
batch_size = X.shape[1]
for i in range(epochs):
minibatches = random_mini_batches(X, Y, batch_size)
cost_total = 0
for minibatch in minibatches:
(minibatch_X,minibatch_Y) = minibatch
last_A, caches = forward_prop_layers(minibatch_X, parameters, activations)
cost_total += compute_cost(last_A, minibatch_Y, loss)
gradients = backward_prop_layers(last_A, minibatch_Y, caches, activations)
parameters = update_parameters(parameters, gradients, lr)
cost_avg = cost_total /m
if i %10 == 0:
print ("Cost after epoch %i: %f" %(i, cost_avg))
costs.append(cost_avg)
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('epochs')
plt.title("Learning rate = " + str(lr))
plt.show()
parameters1 = [parameters, param1, dimensions, activations, costs, lr, batch_size]
return parameters1
Is my function not being called properly? Where exactly am I going wrong in my implementation?
Solution
Oh yeah, here’s why it’s returning True. First you’re assigning param1
to parameters
. Then you’re updating parameters
. But since param1
is pointing to parameters
, even after updating parameters
, param1
still points to the same memory location of parameters
. In python everything is treated as an object. Try printing out some parameters
before and after updating, and then check manually if they are changing or create a copy of parameters
using deepcopy which copies everything in parameters
to a separate memory location.
from copy import deepcopy
def update_parameters(parameters, grads, lr):
param1 = deepcopy(parameters)
L = len(parameters) // 2
for l in range(L):
parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - lr * grads["dW"+str(l+1)]
parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - lr * grads["db"+str(l+1)]
print(param1==parameters)
return parameters
Also try printing out the loss after each iteration, If it is changing, then the parameters
are getting updated, if not then your parameters
aren't getting updated properly.
Answered By - Pranav
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.