Issue
I have been trying to debug a certain model that uses torch.einsum
operator in a layer which is repeated a couple of times.
While trying to analyze the GPU memory usage of the model during training, I have noticed that a certain Einsum operation dramatically increases the memory usage. I am dealing with multi-dimensional matrices. The operation is torch.einsum('b q f n, b f n d -> b q f d', A, B)
.
It is also worth mentioning that:
x
was assigned before to a tensor of the same shape.- In every layer (they are all identical), the GPU memory is linearly increases) after this operation, and does not deallocate until the end of the model iteration.
I have been wondering why this operation uses so much memory, and why the memory stays allocated after every iteration over that layer type.
Solution
Variable "x
" is indeed overwritten, but the tensor data is kept in memory (also called the layer's activation) for later usage in the backward pass.
So in turn you are effectively allocating new memory data for the result of torch.einsum
, but you won't be replacing x
's memory even if it has been seemingly overwritten.
To pass this to the test, you can compute the forward pass under the torch.no_grad()
context manager (where those activations won't be kept in memory) and see the memory usage difference, compared with a standard inference.
Answered By - Ivan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.