Wednesday, July 27, 2022

[FIXED] Is there a way to send location of pytorch tensor in gpu memory between docker containers and build them in different containers

July 27, 2022 cuda, docker, python-3.x, pytorch, sockets No comments

Issue

To quickly sum up the problem, I need to transfer images (size is (1920,1200,3)) between PyTorch docker containers and process them. Containers are located in the same system. Speed is very important and transfer should not take more than 2-3ms one way. Two containers will be shared via IPC so I find no problem transferring NumPy arrays via shared memory using buffers (example https://docs.python.org/3/library/multiprocessing.shared_memory.html). I am curious is there a similar way to do that with PyTorch tensors allocated on GPU?

From what I've learned, CUDA Tensors are already in the shared memory. I tried transferring them and Pytorch Tensor Storage objects via socket but it takes around 50-60ms one way, which is way too slow. For testing purposes, I just run 2 programs in separate terminals.

Container 1 code:

import torch
import zmq

def main():
    ctx = zmq.Context()
    sock = ctx.socket(zmq.REQ)
    sock.connect('tcp://0.0.0.0:6000')

    x = torch.randn((1, 1920, 1200, 3), device='cuda')
    storage = x.storage()
    while True:
        sock.send_pyobj(storage)
        sock.recv()

if __name__ == "__main__":
    main()

Container 2 code:


import torch
import zmq
import time

def main():
    ctx = zmq.Context()
    sock = ctx.socket(zmq.REP)
    sock.bind('tcp://*:6000')

    for i in range(10):
        before = time.time()
        storage = sock.recv_pyobj()
        tensor = torch.tensor((), device=storage.device)
        tensor.set_(storage)
        after = time.time()
        print(after - before)
        sock.send_string('')

if __name__ == "__main__":
    main()

Edit:

I found a similar topic discussed 4 years ago. There person extracts additional information from storage using share_cuda() function, which gives cudaIpcMemHandle_t.

Is there a way to reconstruct Storage/Tensor using cudaIpcMemHandle_t or information extracted from share_cuda() function using Pytoch functional? or there is a better way to achieve the same result?

Solution

I found a function in torch.multiprocessing.reductions that rebuilds tensors from the output generated by _share_cuda_(). Now my code looks something like this:

Container 1 code:

import torch
import zmq

def main():
    ctx = zmq.Context()
    sock = ctx.socket(zmq.REQ)
    sock.connect('tcp://0.0.0.0:6000')

    image = torch.randn((1, 1920, 1200, 3), dtype=torch.float, device='cuda:0')
    storage = image.storage()
    
    (storage_device, storage_handle, storage_size_bytes, storage_offset_bytes,
    ref_counter_handle, ref_counter_offset, event_handle, event_sync_required) = storage._share_cuda_()

    while True:
        sock.send_pyobj({
            "dtype": image.dtype,
            "tensor_size": (1920, 1200, 3),
            "tensor_stride": image.stride(),
            "tensor_offset": image.storage_offset(), # !Not sure about this one.
            "storage_cls": type(storage),
            "storage_device": storage_device,
            "storage_handle": storage_handle,
            "storage_size_bytes": storage_size_bytes,
            "storage_offset_bytes":storage_offset_bytes,
            "requires_grad": False,
            "ref_counter_handle": ref_counter_handle,
            "ref_counter_offset": ref_counter_offset,
            "event_handle": event_handle,
            "event_sync_required": event_sync_required,
        })

        sock.recv_string()

if __name__ == "__main__":
    main()

Container 2 code:

import torch
import zmq
import time
from torch.multiprocessing.reductions import rebuild_cuda_tensor


def main():
    ctx = zmq.Context()
    sock = ctx.socket(zmq.REP)
    sock.bind('tcp://*:6000')

    for i in range(10):
        before = time.time()

        cuda_tensor_info = sock.recv_pyobj()
        rebuilt_tensor = rebuild_cuda_tensor(torch.Tensor, **cuda_tensor_info)

        after = time.time()
        print(after - before)

        sock.send_string('')

if __name__ == "__main__":
    main()

Answered By - Ross Tsenov

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, July 27, 2022

[FIXED] Is there a way to send location of pytorch tensor in gpu memory between docker containers and build them in different containers

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels