Issue
I'm running pytorch in docker. The requirements from security team is to run docker in read-only mode.
I need to fork main process with models, that's why I use function module.share_memory()
to move all models to shared memory and use torch.multiprocessing.set_sharing_strategy('file_system')
because otherwise in file_descriptor
mode 1024 open file descriptors is not enough for me, and I can't increase it because it is hardcoded in Linux. I use gunicorn is sync mode, it uses linux select under the hood.
So when I run docker in read-only mode I'm getting an error:
File "/app/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1515, in share_memory
return self._apply(lambda t: t.share_memory_())
File "/app/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
File "/app/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
File "/app/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File "/app/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 409, in _apply
param_applied = fn(param)
File "/app/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1515, in <lambda>
return self._apply(lambda t: t.share_memory_())
File "/app/.venv/lib/python3.9/site-packages/torch/tensor.py", line 385, in share_memory_
self.storage().share_memory_()
File "/app/.venv/lib/python3.9/site-packages/torch/storage.py", line 143, in share_memory_
self._share_filename_()
RuntimeError: std::exception at /pytorch/torch/lib/libshm/core.cpp:99
I understand that I need to give an additional RW access to some directories but I can't figure out to which directories. Could you help me, how I can find these directories? Of course there is a RW access to /dev/shm, I even can see that pytorch creates files there but then crashes with above error.
I'm using pytroch 1.8.1.
Solution
I was able to fix it in two different ways:
ENV TEMP=/var/tmp
in docker (change tmp path for pytorch from default/tmp
to/var/tmp
) and giving rw access to/var/tmp
by adding to docker run args:-v /var/tmp:/var/tmp
-v /tmp:/tmp
in docker run args (pytorch uses/tmp
by default)
Answered By - Tipok
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.