Wednesday, December 6, 2023

[FIXED] Tensorflow 2.12 - Could not load library libcudnn_cnn_infer.so.8 in WSL2

December 06, 2023 cudnn, miniconda, python-3.x, tensorflow No comments

Issue

I have installed in Windows 10 with WSL2 (Ubuntu 22.04 Kernel), the Tensorflow 2.12, Cuda Toolkit 11.8.0 and cuDNN 8.6.0.163 in Miniconda environment (Python 3.9.16), normally and as the official tensorflow.org recommend. I should emphasize at this point that I want to use Tensorflow 2.12 because with the correspond Cuda Toolkit 11.8.0 it is compatible with Ada Lovelace GPUs (RTX4080 for my case).

When I go to train my model, it gives me the following error:

"Loaded cuDNN version 8600 Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so : cannot open shared object file: No such file or directory".

Is there any idea that is going wrong*?

The paths were configured as follows:

mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

The files referring to my error were searched for using the following commands:

ldconfig -p | grep libcudnn_cnn but it returned nothing so the file does not exist, and
ldconfig -p | grep libcuda where returned libcuda.so.1 (libc6,x86-64) => /usr/lib/wsl/lib/libcuda.so.1

Also, I have try to set the new environmental variable and include that to $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh but without any luck:

export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

*Note that when importing the Tensorflow, I get the following warnings:

TF-TRT Warning: Could not find TensorRT

could not open file to read NUMA node:
    /sys/bus/pci/devices/0000:1c:00.0/numa_node Your kernel may have been built without NUMA support.

In addition, an attempt to follow the NVIDIA Documentation for WSL, specific in section 3 -> Option 1, but this does not solve the problem.

Solution

Ran into this problem and found a working solution after a lot of digging around.

First, the missing libcuda.so can be solved by the method proposed here: https://github.com/microsoft/WSL/issues/5663#issuecomment-1068499676

Essentially rebuilding the symbolic links in the CUDA lib directory:

> cd \Windows\System32\lxss\lib
> del libcuda.so
> del libcuda.so.1
> mklink libcuda.so libcuda.so.1.1
> mklink libcuda.so.1 libcuda.so.1.1

(this is done in an admin elevated Command Prompt shell)

Then when you run into the missing device problem (which you undoubtfully will), solve it by: https://github.com/tensorflow/tensorflow/issues/58681#issuecomment-1406967453

Which boils down to:

$ mkdir -p $CONDA_PREFIX/lib/nvvm/libdevice/
$ cp -p $CONDA_PREFIX/lib/libdevice.10.bc $CONDA_PREFIX/lib/nvvm/libdevice/
$ export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/lib

And

$ conda install -c nvidia cuda-nvcc --yes

(verify by ptxas --version)

If you're running notebooks in VSCode remote WSL then you'd need to add export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/lib to /$CONDA_PREFIX/etc/conda/activate.d/env_vars.sh (this is good practice anyway)

Answered By - Roy Shilkrot

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, December 6, 2023

[FIXED] Tensorflow 2.12 - Could not load library libcudnn_cnn_infer.so.8 in WSL2

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels