Issue
Desired behaviour
We have an existing workflow in vanilla Jupyter Notebook/Lab where we use relative paths to store outputs of some notebooks. Example:
/home/user/notebooks/notebook1.ipynb
/home/user/notebooks/notebook1_output.log
/home/user/notebooks/project1/project.ipynb
/home/user/notebooks/project1/project_output.log
In both notebooks, we produce the output by simply writing to ./output.log
or so.
Problem
However, we are now trying Google Dataproc with Jupyter optional component, and the current directory is always /
regardless of which notebook it's run from. This applies for both the notebook and Lab interfaces.
What I've tried
Disabling c.FileContentsManager.root_dir='/'
in /etc/jupyter/jupyter_notebook_config.py
causes the current directory to be set to wherever I started jupyter notebook
from, but it is always that initial starting folder instead of following the .ipynb notebook files.
Any idea on how to restore the "dynamic" current directory behaviour?
Even if it's not possible, I'd like to understand how Dataproc even makes Jupyter behave differently.
Details
- Dataproc Image
2.0-debian10
- Notebook Server
6.2.0
- Jupyterlab
3.0.18
Solution
No it is not possible to always get the current directory where your .ipynb file is. Jupyter is running from the local filesystem
of the master node of your cluster. It will always take the default system path for its kernel.
In other cases(besides dataproc) also it is not possible to consistently get the path of a Jupyter notebook. You can check out this thread regarding this topic.
You have to mention the directory path for your log file to be saved in the desired path.
Note that the GCS
folder in your Lab refers to the Google Cloud storage Bucket of your cluster. You can create .ipynb in GCS but when you will execute the file it will be running inside the local system.Thus you will not be able to save log files in GCS directly.
EDIT:
It's not only Dataproc
who makes Jupyter
behave differently.If you use Google Colab
notebooks there you will also see the same behaviour.
The reason is because youre always executing code in the kernel
does not matter where the file is. And in theory multiple notebooks could connect to that kernel.Thus you can't have multiple working directories for the same kernel.
As I mentioned earlier by default if you're starting a notebook, the current working directory is set to the path of the notebook.
Link to the main thread -> https://github.com/ipython/ipython/issues/10123
Answered By - Sayan Bhattacharya
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.