Issue
I am trying to install PyTorch with CUDA. I followed the instructions (installation using conda) mentioned in https://pytorch.org/get-started/locally/
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
The conda install command runs without giving any error:
conda list displays the following:
# Name Version Build Channel
cudatoolkit 11.3.1 h2bc3f7f_2
pytorch 1.11.0 py3.9_cuda11.3_cudnn8.2.0_0 pytorch
pytorch-mutex 1.0 cuda pytorch
torch 1.10.2 pypi_0 pypi
torchaudio 0.11.0 py39_cu113 pytorch
torchvision 0.11.3 pypi_0 pypi
But when I check whether GPU driver and CUDA is enabled and accessible by PyTorch
torch.cuda.is_available()
returns false.
Prior to Pytorch installation, I checked and confirmed the pre-requisites mentioned in
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#pre-installation-actions
Here are my ubuntu server details:
Environment:
- OS/kernel:
Ubuntu 18.04.6 LTS (GNU/Linux 4.15.0-154-generic x86_64)
Footnote under the table: Table 1. Native Linux Distribution Support in CUDA 11.6 mentions
For Ubuntu LTS on x86-64, the Server LTS kernel (e.g. 4.15.x for 18.04) is supported in CUDA 11.6.
- GCC
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
- GLIBC
ldd (Ubuntu GLIBC 2.27-3ubuntu1.5) 2.27
GPU
GeForce GTX 1080 Ti
Kernel headers and development packages
$ uname -r
4.15.0-176-generic
As per my understanding, conda pytorch installation with CUDA will install the CUDA driver too.
I am not sure where did I went wrong. Thanks in advance.
EDIT:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
nvcc
shows CUDA version 9.1
whereas
$ nvidia-smi
Wed May 11 06:44:31 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:05:00.0 Off | N/A |
| 25% 40C P8 11W / 250W | 18MiB / 11177MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:06:00.0 Off | N/A |
| 25% 40C P8 11W / 250W | 2MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... Off | 00000000:09:00.0 Off | N/A |
| 25% 35C P8 11W / 250W | 2MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 4119 G /usr/lib/xorg/Xorg 9MiB |
| 0 4238 G /usr/bin/gnome-shell 6MiB |
+-----------------------------------------------------------------------------+
nvidia-smi
shows CUDA version 10.0
https://varhowto.com/check-cuda-version/
This article mentions that nvcc refers to CUDA-toolkit whereas nvidia-smi
refers to NVIDIA driver.
Q1: Does it shows that there are two different CUDA installation at the system wide level?
Nvidia Cudatoolkit vs Conda Cudatoolkit
The CUDA toolkit (version 11.3.1) I am installing in my conda environment is different from the one installed as system wide level (which is shown by the output of nvcc
and nvidia-smi
).
Q2: As per the above stackoverflow thread answer, they can be separate. Or is it the reason for my failure to install cuda locally?
Solution
I have solved the issue.
Disclaimer: I am a newbie in CUDA. Following answer is based on a) what I have read in other threads b) my experience based on those discussions.
Core Logic: CUDA driver's version >= CUDA runtime version
Reference: Different CUDA versions shown by nvcc and NVIDIA-smi
In most cases, if nvidia-smi reports a CUDA version that is numerically equal to or higher than the one reported by nvcc -V, this is not a cause for concern. That is a defined compatibility path in CUDA (newer drivers/driver API support "older" CUDA toolkits/runtime API).
As I am using conda's cudatoolkit:
- Driver API: nvidia-smi
- Runtime API: conda's cudatoolkit
For cudatoolkit 11.3.1, I was using nvidia-smi CUDA Version: 10.0
Solution: Upgrade NVIDIA drivers.
Upgraded the NVIDIA drivers following the instruction at https://linuxconfig.org/how-to-install-the-nvidia-drivers-on-ubuntu-18-04-bionic-beaver-linux
Post upgradation, here's the output of nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:05:00.0 Off | N/A |
| 27% 46C P8 12W / 250W | 19MiB / 11177MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:06:00.0 Off | N/A |
| 25% 44C P8 11W / 250W | 2MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce ... Off | 00000000:09:00.0 Off | N/A |
| 25% 39C P8 11W / 250W | 2MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3636 G /usr/lib/xorg/Xorg 9MiB |
| 0 N/A N/A 4263 G /usr/bin/gnome-shell 6MiB |
+-----------------------------------------------------------------------------+
Now driver version(11.4) >= runtime version (11.3.1)
PyTorch is now able to use CUDA with GPU:
In [1]: import torch
In [2]: torch.cuda.is_available()
Out[2]: True
Answered By - Kaushik Acharya
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.