Sunday, November 19, 2023

[FIXED] PyTorch Memory Allocation cat and del

November 19, 2023 python, pytorch No comments

Issue

I have the following function:

from typing import OrderedDict
import torch

def flatten_dict(
        aligned_feat: OrderedDict[str, list[torch.Tensor]], 
    ):
    num_batches = len(list(aligned_feat.values())[0])
    flat_features = [None] * num_batches

    for k in list(aligned_feat.keys()):
        for b in range(num_batches):
            if flat_features[b] is None:
                flat_features[b] = aligned_feat[k][b]
            else:
                flat_features[b] = torch.cat((flat_features[b], aligned_feat[k][b]))
        # free up mem
        del aligned_feat[k]
        print(f"{k}: {torch.cuda.max_memory_allocated()}")

    return flat_features

if __name__ == "__main__":
    keys = ['0', '1', '2', '3']
    d = OrderedDict()

    for k in keys:
        d[k] = [torch.rand((50, 256, 7, 7)).to("cuda") for _ in range(4)]

    print(str(torch.cuda.max_memory_allocated()))
    data = flatten_dict(d)
    print(str(torch.cuda.max_memory_allocated()))

I get the following output:

41943040 
0: 41943040 
1: 54487040 
2: 56995840 
3: 59504640 
59504640

Can you help me understand the change in max memory allocation? Why is it not increasing at key 0? Why is it increasing after the first increase? And is there another way of "flattening" the dict, without increasing the max allocated memory (also not increasing it in the function itself)? Be aware, that I cannot stack the list[torch.Tensor] in the input, as they could have different shape[0] (I just now for certain, that shape[1:] are always [256, 7, 7] and I do not need the dictionary data format anymore after flattening).

Thanks!

Solution

The memory increase is coming from the concatenation. Concatenation requires creating a new tensor in memory to hold the output of the concatenation.

To visualize, I added more reporting to your code

def flatten_dict(
        aligned_feat: OrderedDict[str, list[torch.Tensor]], 
    ):
    num_batches = len(list(aligned_feat.values())[0])
    flat_features = [None] * num_batches

    for k in list(aligned_feat.keys()):
        for b in range(num_batches):
            if flat_features[b] is None:
                report_val = 'no cat'
                flat_features[b] = aligned_feat[k][b]
            else:
                report_val = 'cat' 
                flat_features[b] = torch.cat((flat_features[b], aligned_feat[k][b]))
            print(f"{k}: {torch.cuda.max_memory_allocated()}, {report_val}")
        del aligned_feat[k]

    return flat_features

This produces

41943040
0: 41943040, no cat
0: 41943040, no cat
0: 41943040, no cat
0: 41943040, no cat
1: 46960640, cat
1: 49469440, cat
1: 51978240, cat
1: 54487040, cat
2: 54487040, cat
2: 54487040, cat
2: 54487040, cat
2: 56995840, cat
3: 56995840, cat
3: 56995840, cat
3: 56995840, cat
3: 59504640, cat
59504640

To your next question is there another way of "flattening" the dict, without increasing the max allocated memory, you would need to "flatten" the dict in a way that did not involve writing new tensors to memory. There's a lot of "it depends" based on your use case, but generally speaking if you want to merge tensors into a new tensor, that will require additional memory usage because the process of creating the new tensor requires both the input tensors and the output tensors to be held in memory simultaneously.

Also note that torch.cuda.max_memory_allocated() tracks the max memory allocation from the start of the process. It does not reset when memory is deleted. You can track ongoing memory use with torch.cuda.memory_allocated().

Running your original code with torch.cuda.memory_allocated() shows the effect of deleting the original tensors

41943040
0: 41943040
1: 41041920
2: 41943040
3: 41041920
41041920

Answered By - Karl

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, November 19, 2023

[FIXED] PyTorch Memory Allocation cat and del

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels