Issue
I'm training a Keras model, and I needed to switch devices to have more power(from Windows i3 core to Ubuntu i7). The problem is, my code works fine on my Windows, but shows the following error that stops the computation before even running the first epoch. Here is the full output:
/home/willylutz/PycharmProjects/hiv_image_analysis/venv/bin/python /home/willylutz/PycharmProjects/hiv_image_analysis/main.py
2022-09-19 09:36:52.801711: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-19 09:36:52.956260: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-09-19 09:36:53.502748: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/willylutz/PycharmProjects/hiv_image_analysis/venv/lib/python3.8/site-packages/cv2/../../lib64:
2022-09-19 09:36:53.502794: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/willylutz/PycharmProjects/hiv_image_analysis/venv/lib/python3.8/site-packages/cv2/../../lib64:
2022-09-19 09:36:53.502800: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
0
Found 480 files belonging to 2 classes.
Using 384 files for training.
2022-09-19 09:37:00.058171: E tensorflow/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-09-19 09:37:00.058202: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (zhang): /proc/driver/nvidia/version does not exist
2022-09-19 09:37:00.067388: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Found 480 files belonging to 2 classes.
Using 96 files for validation.
['INF', 'NI']
2022-09-19 09:37:10.149236: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:390] Filling up shuffle buffer (this may take a while): 373 of 512
2022-09-19 09:37:10.197351: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:415] Shuffle buffer filled.
(64, 1024, 1024, 3)
(64,)
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
sequential (Sequential) (None, 1024, 1024, 3) 0
rescaling (Rescaling) (None, 1024, 1024, 3) 0
conv2d (Conv2D) (None, 1024, 1024, 16) 448
max_pooling2d (MaxPooling2D (None, 512, 512, 16) 0
)
conv2d_1 (Conv2D) (None, 512, 512, 32) 4640
max_pooling2d_1 (MaxPooling (None, 256, 256, 32) 0
2D)
conv2d_2 (Conv2D) (None, 256, 256, 64) 18496
max_pooling2d_2 (MaxPooling (None, 128, 128, 64) 0
2D)
dropout (Dropout) (None, 128, 128, 64) 0
flatten (Flatten) (None, 1048576) 0
dense (Dense) (None, 128) 134217856
outputs (Dense) (None, 2) 258
=================================================================
Total params: 134,241,698
Trainable params: 134,241,698
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
2022-09-19 09:37:16.377367: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 4294967296 exceeds 10% of free system memory.
Process finished with exit code 137 (interrupted by signal 9: SIGKILL)
If needed I can put my code too, but I feel it is not the problem. Thanks for the help.
Solution
There is a bug in keras/tensorflow 2.9 and 2.10, which causes preprocess layers like rescaling to be extremly slow: https://github.com/tensorflow/tensorflow/issues/56242
Try the model without the rescaling layer. If you want to use this or similar preprocessing layers, you should use TF version 2.8.3 or older.
I am not sure about the following, but
Could not load dynamic library 'libnvinfer.so.7'
seems to suggest there is also some other problem, maybe wrong version of tensorflow / keras / cuda.
Answered By - AndrzejO
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.