Issue
I have the following function:
from typing import OrderedDict
import torch
def flatten_dict(
aligned_feat: OrderedDict[str, list[torch.Tensor]],
):
num_batches = len(list(aligned_feat.values())[0])
flat_features = [None] * num_batches
for k in list(aligned_feat.keys()):
for b in range(num_batches):
if flat_features[b] is None:
flat_features[b] = aligned_feat[k][b]
else:
flat_features[b] = torch.cat((flat_features[b], aligned_feat[k][b]))
# free up mem
del aligned_feat[k]
print(f"{k}: {torch.cuda.max_memory_allocated()}")
return flat_features
if __name__ == "__main__":
keys = ['0', '1', '2', '3']
d = OrderedDict()
for k in keys:
d[k] = [torch.rand((50, 256, 7, 7)).to("cuda") for _ in range(4)]
print(str(torch.cuda.max_memory_allocated()))
data = flatten_dict(d)
print(str(torch.cuda.max_memory_allocated()))
I get the following output:
41943040
0: 41943040
1: 54487040
2: 56995840
3: 59504640
59504640
Can you help me understand the change in max memory allocation? Why is it not increasing at key 0? Why is it increasing after the first increase? And is there another way of "flattening" the dict, without increasing the max allocated memory (also not increasing it in the function itself)? Be aware, that I cannot stack the list[torch.Tensor] in the input, as they could have different shape[0] (I just now for certain, that shape[1:] are always [256, 7, 7] and I do not need the dictionary data format anymore after flattening).
Thanks!
Solution
The memory increase is coming from the concatenation. Concatenation requires creating a new tensor in memory to hold the output of the concatenation.
To visualize, I added more reporting to your code
def flatten_dict(
aligned_feat: OrderedDict[str, list[torch.Tensor]],
):
num_batches = len(list(aligned_feat.values())[0])
flat_features = [None] * num_batches
for k in list(aligned_feat.keys()):
for b in range(num_batches):
if flat_features[b] is None:
report_val = 'no cat'
flat_features[b] = aligned_feat[k][b]
else:
report_val = 'cat'
flat_features[b] = torch.cat((flat_features[b], aligned_feat[k][b]))
print(f"{k}: {torch.cuda.max_memory_allocated()}, {report_val}")
del aligned_feat[k]
return flat_features
This produces
41943040
0: 41943040, no cat
0: 41943040, no cat
0: 41943040, no cat
0: 41943040, no cat
1: 46960640, cat
1: 49469440, cat
1: 51978240, cat
1: 54487040, cat
2: 54487040, cat
2: 54487040, cat
2: 54487040, cat
2: 56995840, cat
3: 56995840, cat
3: 56995840, cat
3: 56995840, cat
3: 59504640, cat
59504640
To your next question is there another way of "flattening" the dict, without increasing the max allocated memory
, you would need to "flatten" the dict in a way that did not involve writing new tensors to memory. There's a lot of "it depends" based on your use case, but generally speaking if you want to merge tensors into a new tensor, that will require additional memory usage because the process of creating the new tensor requires both the input tensors and the output tensors to be held in memory simultaneously.
Also note that torch.cuda.max_memory_allocated()
tracks the max memory allocation from the start of the process. It does not reset when memory is deleted. You can track ongoing memory use with torch.cuda.memory_allocated()
.
Running your original code with torch.cuda.memory_allocated()
shows the effect of deleting the original tensors
41943040
0: 41943040
1: 41041920
2: 41943040
3: 41041920
41041920
Answered By - Karl
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.