Friday, January 26, 2024

[FIXED] How to solve Conv1D DNN library is not found?

January 26, 2024 anaconda, gpu, keras, tensorflow No comments

Issue

How to solve the error Cnv1 DNN library not found in tensorflow?

And, are the compatibilities given by tensorflow here, https://www.tensorflow.org/install/source valid backwards as well?

This question is a possible duplicate to (certainly related to): Colab: (0) UNIMPLEMENTED: DNN library is not found and Unimplemented Error Node: 'sequential/conv1d/Conv1D' DNN library is not found running in Jupyter on Windows However, I could not quite follow their solution. Here is my problem:

I am training a convolutional neural network (CNN) with Keras/tensorflow. On my PC it appears to be running fine. However, I must get it to work on GPUs. And it is there that I am running into all sorts of issues. The latest:

Node: 'XXX/1st_Conv1D/Conv1D'
DNN library is not found.
    [[{{node XXX/1st_Conv1D/Conv1D}}]] [Op:__interference_train_function_5577]

The XXX is name chosen by me in the code. The training starts fine and then abruptly aborts after some time with the above message.

I have installed conda (WITHOUT admin rights) on the GPU server (will not get amdin rights, its a univesity system). conda list gives the following:

cudnn                Version 8.4.1.50
keras                Version 2.10.0
keras-preprocessing  Version 1.1.2
tensorflow           Version 2.10.0
tensorflow-base      Version 2.10.0
tesnorflow-estimator Version 2.10.0
cuda-toolkit         Version 12.0.0
cuda-tools           Version 12.0.0
cuda-cudart          Version 12.0.107
cuda-python          Version 11.8.1
cudatoolkit          Version 11.8
python               Version 3.9.15

And various other packages (many cuda-something). (I have not installed bazel). The listed packages cudatoolkit and cuda-toolkit is not a typo. Neither is python nore cuda-python (from nvidia channel of anaconda). My understanding from https://www.tensorflow.org/install/source is that this should be fine. Some people mention some library paths. I don't get it. For this I need some more idiot-proof explanation :-( (Keep in mind, I have no admin rights)

In my python code I import the following:
from keras import backend as K
import tensorflow as tf
from tensorflow import keras
from keras import Input
from keras.models import Model
from keras.layers import BatchNormalization
from keras.layers import LeakyReLU
from keras.layers import Activation
from keras.layers import Dense
from keras.layers import Reshape
from keras.layers import Conv1D
from keras import initializers
from keras.callbacks import EarlyStopping
from keras.callbacks import ModelCheckpoint
from keras.callbacks import ReduceLROnPlateau
from keras.models import load_model
from keras.optimizers import Adam
from keras.utils import Sequence
from keras.metrics import MeanSquaredError
from keras.metrics import MeanAbsoluteError
from keras.metrics import RootMeanSquaredError

# for plotting
import matplotlib
# import matplotlib.pyplot as plt
matplotlib.use('Agg')
from matplotlib import pyplot as plt
import matplotlib.colors as mcolors
import matplotlib as mpl
import matplotlib.patches as mpatches
from matplotlib.colors import to_rgb
from matplotlib.collections import PolyCollection
from matplotlib.legend_handler import HandlerTuple
import seaborn as sns  # for violin plots

I also import pandas, numpy, wave, time,sys,json,argparse, pathlib, datetime and random. They are not causing any problems so far. The model is then trained with:

    history = model.fit(x=train_data_gen,
                    validation_data=vali_data_gen,
                    batch_size=BATCH_SIZE,
                    epochs=EPOCHS,
                    verbose=VERBOSITY,
                    shuffle=SHUFFLE,
                    max_queue_size=QUEUE_SIZE,
                    use_multiprocessing=MULTI_PROCESSING,
                    workers=WORKERS,
                    callbacks=[early_stopping,
                               model_checking,
                               reduce_lr])

Where MULTI_PROCESSING is set to True (see: https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly). This works fine on a PC. But is obviously slow. I MUST get it to work on GPUs. Queue Size is set to 1000. And workers is 48. I can lower it. Some people suggested memory problems, though I cannot imagine that.

Solution

The installed CUDA and cuDNN version in your system is not matching with the tested build configurations defined for the TensorFlow 2.10 GPU setup. (Check the image below)

You need to install the specified version of cuDNN 8.1 and CUDA 11.2 compatible to TensorFlow 2.10 to enable GPU support in your system.

Please check the link to install these software. Thank you.

Answered By - user11530462

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, January 26, 2024

[FIXED] How to solve Conv1D DNN library is not found?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels