Issue
I use Jupyter lab and Jupyter Notebook for my Deep Learning programs, so I made some long runs in order to train my models. But for some weeks, I've had recurrent kernel restarts after hours of training, which is very annoying. In addition, very few informations are given by the server console or by the browser log :
Jupyter-lab server log :
[I 2021-02-26 00:40:03.756 ServerApp] AsyncIOLoopKernelRestarter: restarting kernel (1/5), keep random ports
kernel 1330ee40-a826-44e2-9be9-f123deeaa1b2 restarted
[I 2021-02-26 00:40:04.070 ServerApp] Starting buffering for 1330ee40-a826-44e2-9be9-f123deeaa1b2:1b7fa111-f2d2-4804-bd90-c81e26562254
[I 2021-02-26 00:40:04.112 ServerApp] Restoring connection for 1330ee40-a826-44e2-9be9-f123deeaa1b2:1b7fa111-f2d2-4804-bd90-c81e26562254
I have the same problem when I use Jupyter-notebook instead of Jupyter-lab.
Various remarks :
- The server and the client are not on the same machine, therefore I use ssh to connect to the server as described here.
- I work under a corporation proxy
- I use Tensorflow 2 for Deep Learning
Solution
Ok I thing I found the error's cause -> It was certainly a little memory leak in the code I was running which caused program crash after hundreds of epochs.
Answered By - vincent59
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.