Issue
I'm working on trying to compare the converge rate of SGD and GD algorithms for the neural networks. In PyTorch, we often use SGD optimizer as follows.
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
for epoch in range(epochs):
running_loss = 0
for input_batch, labels_batch in train_dataloader:
input = input_batch
y_hat = model(input)
y = labels_batch
L = loss(y_hat, y)
optimizer.zero_grad()
L.backward()
optimizer.step()
running_loss += L.item()
My understanding about the optimizer here is that the SGD optimizer actually does the Mini-batch Gradient Descent algorithm because we feed the optimizer one batch of data at one time. So, if we set the batch_size parameter as the size of all data, the code actually does Gradient Descent for the neural network.
Is my understanding correct?
Solution
Your understanding is correct. SGD is just updating weights based on the gradient computed by backpropagation. The flavor of gradient descent that it performs is therefore determined by the data loader.
- Gradient descent (aka batch gradient descent): Batch size equal to the size of the entire training dataset.
- Stochastic gradient descent: Batch size equal to one and
shuffle=True
. - Mini-batch gradient descent: Any other batch size and
shuffle=True
. By far the most common in practical applications.
Answered By - jodag
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.