Issue
I'm using Jupyter notebook with PySpark, which uses Spark as a kernel.
The problem is that I'm not sure how to properly close it and I have an impression that something keeps hanging, as the memory on the driver on which the notebook is running gets full and crashes (I get GC overhead exception).
I'm closing the whole thing by simply killing the notebook using the process id which I save to .pid file. But I have a feeling that the following state is note good:
What is the actual problem and how to solve it, that is, how the close the whole thing (on driver and on the yarn) properly?
Solution
You should use "File" -> "Close and halt" inside Jupyter. This will close spark context and kill yarn containers from the session.
Answered By - Mariusz
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.