Issue
I'm using jupyter and pandas read_sql, this works fine but looks ugly.
for instance I have a query:
SELECT * FROM table_a AS a LIMIT 10;
I could show it nicely in a markdown cell as so:
``` mysql
SELECT *
FROM table_a AS a
LIMIT 10;
```
and I could execute it in a code cell as so:
pd.read_sql('SELECT * FROM table_a AS a LIMIT 10;', conn)
this involves copy/paste and displaying the query twice (not too good if I want to simply export my notebook to a pdf report)
is there a way to avoid the duplication by reading the markdown text into a string python variable, or any other way?
Solution
The cellmagic answer cited by @Micah Kornfield in the question comments may be a good fit for many situations. In the question however it is said that it is desirable to avoid duplicates. Let's imagine that the SQL is huge and we don't want see the same query more than once.
Unfortunatelly right now in 2021 there's no easy solution for this. In a jupyter notebook there are two worlds, the backend which is the kernel and in our case runs python, and the frontend which runs javascript. Only javascript sees the markdown cells. It is possible to make the backend and frontend communicate with each other, those methods are usually a little hacky, but anyway we will rely on some of them.
I have written a script that does our job in two different ways, which will probably bring similar results. I will call those methods the file read method and the javascript method.
First, please save the following file markdown.py
in the same folder as the ipython (we are using a separate file because you specified that your notebook willl eventually go to a report and it is undesirable to have this script together with the notebook):
from IPython.display import Javascript
from urllib.parse import unquote
from json import loads as jsonloads
def markread(cellnumber,notebookname=None,callbackvar=None):
try:
if type(cellnumber) is int:# maybe check if (varname in globals()):
if callbackvar is not None and type(callbackvar) is str:
return Javascript("const mdtjs = Jupyter.notebook.get_cells().filter(c=>c.cell_type==\"markdown\")["+str(cellnumber)+"].get_text(); IPython.notebook.kernel.execute(\"mdtp = unquote('\"+encodeURI(mdtjs)+\"');mdtp=mdtp[mdtp.find('\\\\n',mdtp.find('```'))+1:min(mdtp.rfind('\\\\n'),mdtp.rfind('```'))].strip();"+callbackvar+"=mdtp;del mdtp\");")
if notebookname is not None and type (notebookname) is str:
if not notebookname.endswith('.ipynb'):
notebookname += '.ipynb'
with open(notebookname) as f:
j = jsonloads(f.read())
mdts = [''.join(c['source'][1:]).strip().strip('`').strip() for c in j['cells'] if c['cell_type']=='markdown']
return mdts[cellnumber]
except:
return None
return None
Now back to the notebook, to load the script, you have to import it:
from markdown import markread, unquote
The unquote is needed to use the javascript method, otherwhise you can skip it.
1. File read method:
Usage:
marktext = markread(2, notebookname='mynotebookname')
Here marktext
will get the value from the third markdown cell in the mynotebookname
(third because we live in a zero-indexed world, so 2 means third; if you skip '.ipynb' extension in the notebookname as in this case it will be automatically appended). Important - this method reads the notebook file writen on disk and not the hot state of things. If you changed anything since last save, things may go wrong.
2. Javascript method:
Usage:
markread(1, callbackvar='marktext')
Here we write the value of our second markdown cell to a variable called marktext
. Javascript method is trickier - it is async, so we have to send the name of the variable that we want to write to (must be a string representing its name, not the variable itself). Is is important to know also that markread
must be the last command in the cell due to a limitation in javascript invoking.
How it works
Internally, the file read method just reads the notebook file which is json, picks the value from 'cells' and filters out the ones which are markdown.
The javascript method however is more complex. It invokes JS because JS has access to the cells including markdown, so JS reads cells values (from the Jupyter.notebook.get_cells), filters the markdown ones, invoke python back and send back those markdown cells - url enconded. Those encoded cells are decoded back and assigned to the callbackvar
. In both methods I made some assumptions that may not be correct about trimming the start and the end of the cell value (the ``` and whitespaces).
There are ways to improve the code, for example making it auto detect the notebook name for the file read method, but it involves even more hacks, relying again on javascript to get the notebook name or making an call to the api on port 8888, but having to deal with session password. I believe the most important is covered already by our script. If one method does not work, you will probably still have the other.
Answered By - brunoff
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.