Issue
I was tasked with creating a CI workflow for building a PyTorch CUDA extension for this application. Up until now, the application was deployed by creating the target AWS VM with a CUDA GPU, pushing all the sources there and running setup.py
, but instead I want to do the build in our CI system and deploy pre-built binaries to the production environment.
When running setup.py
in the CI system, I get the error "No CUDA GPUs are available" - which is true, there are no CUDA GPUs in the CI system. Is there a way to just build the CUDA extension without a CUDA GPU available?
This is the error message:
gcc -pthread -shared -B /usr/local/miniconda/envs/build/compiler_compat -L/usr/local/miniconda/envs/build/lib -Wl,-rpath=/usr/local/miniconda/envs/build/lib -Wl,--no-as-needed -Wl,--sysroot=/ /app/my-app/build/temp.linux-x86_64-3.6/my-extension/my-module.o -L/usr/local/miniconda/envs/build/lib/python3.6/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.6/my-extension/my-module.cpython-36m-x86_64-linux-gnu.so
building 'my-extension.my-module._cuda_ext' extension
creating /app/my-app/build/temp.linux-x86_64-3.6/my-extension/src
Traceback (most recent call last):
File "setup.py", line 128, in <module>
'build_ext': BuildExtension
File "/usr/local/miniconda/envs/build/lib/python3.6/site-packages/setuptools/__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/usr/local/miniconda/envs/build/lib/python3.6/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/local/miniconda/envs/build/lib/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/usr/local/miniconda/envs/build/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/usr/local/miniconda/envs/build/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/usr/local/miniconda/envs/build/lib/python3.6/distutils/command/build_ext.py", line 339, in run
self.build_extensions()
File "/usr/local/miniconda/envs/build/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 653, in build_extensions
build_ext.build_extensions(self)
File "/usr/local/miniconda/envs/build/lib/python3.6/distutils/command/build_ext.py", line 448, in build_extensions
self._build_extensions_serial()
File "/usr/local/miniconda/envs/build/lib/python3.6/distutils/command/build_ext.py", line 473, in _build_extensions_serial
self.build_extension(ext)
File "/usr/local/miniconda/envs/build/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
_build_ext.build_extension(self, ext)
File "/usr/local/miniconda/envs/build/lib/python3.6/distutils/command/build_ext.py", line 533, in build_extension
depends=ext.depends)
File "/usr/local/miniconda/envs/build/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 468, in unix_wrap_ninja_compile
cuda_post_cflags = unix_cuda_flags(cuda_post_cflags)
File "/usr/local/miniconda/envs/build/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 377, in unix_cuda_flags
cflags + _get_cuda_arch_flags(cflags) +
File "/usr/local/miniconda/envs/build/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1407, in _get_cuda_arch_flags
capability = torch.cuda.get_device_capability()
File "/usr/local/miniconda/envs/build/lib/python3.6/site-packages/torch/cuda/__init__.py", line 291, in get_device_capability
prop = get_device_properties(device)
File "/usr/local/miniconda/envs/build/lib/python3.6/site-packages/torch/cuda/__init__.py", line 296, in get_device_properties
_lazy_init() # will define _get_device_properties
File "/usr/local/miniconda/envs/build/lib/python3.6/site-packages/torch/cuda/__init__.py", line 172, in _lazy_init
torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
I'm not very familiar with CUDA and only half proficient in Python (I'm here as the "ops" part of "devops").
Solution
It is not a complete solution, as I lack details to completely figure out a solution. But it should help you or your teammates.
So first based on the source code, it is not required to reach torch._C._cuda_init()
if you have CUDA arch flags
set.
This means the pytorch
is trying to figure out the CUDA arch
because it is not specified by the user.
Here is a related thread. As you can see, setting the TORCH_CUDA_ARCH_LIST
environment to something that fits your environment should work for you.
Answered By - Sraw
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.