Friday, November 19, 2021

[FIXED] What is tape-based autograd in Pytorch?

November 19, 2021 machine-learning, python, pytorch, tensorflow, tensorflow2.0 No comments

Issue

I understand autograd is used to imply automatic differentiation. But what exactly is tape-based autograd in Pytorch and why there are so many discussions that affirm or deny it.

For example:

this

In pytorch, there is no traditional sense of tape

and this

We don’t really build gradient tapes per se. But graphs.

but not this

Autograd is now a core torch package for automatic differentiation. It uses a tape based system for automatic differentiation.

And for further reference, please compare it with GradientTape in Tensorflow.

Solution

There are different types of automatic differentiation e.g. forward-mode, reverse-mode, hybrids; (more explanation). The tape-based autograd in Pytorch simply refers to the uses of reverse-mode automatic differentiation, source. The reverse-mode auto diff is simply a technique used to compute gradients efficiently and it happens to be used by backpropagation, source.

Now, in PyTorch, Autograd is the core torch package for automatic differentiation. It uses a tape-based system for automatic differentiation. In the forward phase, the autograd tape will remember all the operations it executed, and in the backward phase, it will replay the operations.

Same in TensorFlow, to differentiate automatically, It also needs to remember what operations happen in what order during the forward pass. Then, during the backward pass, TensorFlow traverses this list of operations in reverse order to compute gradients. Now, TensorFlow provides the tf.GradientTape API for automatic differentiation; that is, computing the gradient of computation with respect to some inputs, usually tf.Variables. TensorFlow records relevant operations executed inside the context of a tf.GradientTape onto a tape. TensorFlow then uses that tape to compute the gradients of a recorded computation using reverse mode differentiation.

So, as we can see from the high-level viewpoint, both are doing the same operation. However, during the custom training loop, the forward pass and calculation of the loss are more explicit in TensorFlow as it uses tf.GradientTape API scope whereas in PyTorch it's implicit for these operations but it requires to set required_grad flags to False temporarily while updating the training parameters (weights and biases). For that, it uses torch.no_grad API explicitly. In other words, TensorFlow's tf.GradientTape() is similar to PyTorch's loss.backward(). Below is the simplistic form in the code of the above statements.

# TensorFlow 
[w, b] = tf_model.trainable_variables
for epoch in range(epochs):
  with tf.GradientTape() as tape:
    # forward passing and loss calculations 
    # within explicit tape scope 
    predictions = tf_model(x)
    loss = squared_error(predictions, y)

  # compute gradients (grad)
  w_grad, b_grad = tape.gradient(loss, tf_model.trainable_variables)

  # update training variables 
  w.assign(w - w_grad * learning_rate)
  b.assign(b - b_grad * learning_rate)


# PyTorch 
[w, b] = torch_model.parameters()
for epoch in range(epochs):
  # forward pass and loss calculation 
  # implicit tape-based AD 
  y_pred = torch_model(inputs)
  loss = squared_error(y_pred, labels)

  # compute gradients (grad)
  loss.backward()
  
  # update training variables / parameters  
  with torch.no_grad():
    w -= w.grad * learning_rate
    b -= b.grad * learning_rate
    w.grad.zero_()
    b.grad.zero_()

FYI, in the above, the trainable variables (w, b) are manually updated in both frameworks but we generally use an optimizer (e.g. adam) to do the job.

# TensorFlow 
# ....
# update training variables 
optimizer.apply_gradients(zip([w_grad, b_grad], model.trainable_weights))

# PyTorch
# ....
# update training variables / parameters
optimizer.step()
optimizer.zero_grad()

Answered By - M.Innat

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, November 19, 2021

[FIXED] What is tape-based autograd in Pytorch?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels