Issue
I'm trying to run Jupyter notebook file for each inputs in the python list from another notebook
I've used Jupyter Notebook's magic command %run
to accomplish the task
input_list= [1, 131, 312, 327, 348, 485, 469, 1218, 1329, 11212]
for i in input_list:
try:
input = i
!run ./notebook.ipynb
except:
pass
Code is working but the execution time is very high So I decided to use Multiprocessing Libraries with the code to execute the code faster
function using inside multiprocessing
def function(i):
try:
input = i
print(input)#print the current element passed
%run ./notebook.ipynb
except:
pass
multiproccessing code
from multiprocessing import Pool, cpu_count
from tqdm import tqdm
p = Pool(8)
tqdm(p.imap(function, input_list))
p.close()
p.join()
But problem here is the argument that is passed to Function is not passed to notebook used in %run magic command
I got a error like "input is not defined"
What would be a possible solution for this problem?
Solution
It works when you follow the guide here to how to use arguments.
Illustrating with a minimal working example.
Make a notebook called add3.ipynb
with the following contents as the only cell in it:
o = i + 3
print (f"where the input is {i}; the output is {o}\n")
Then for your notebook to control the running with various values like you want, use in a code cell the following:
# based on https://pymotw.com/3/multiprocessing/basics.html
import multiprocessing
def worker(i):
try:
print (f"input is {i}\n")#print the current element passed
%run ./add3.ipynb
except:
pass
input_list= [1, 131, 312, 327, 348, 485, 469, 1218, 1329, 11212]
if __name__ == '__main__':
jobs = []
for i in input_list:
p = multiprocessing.Process(target=worker, args=(i,))
jobs.append(p)
p.start()
I'll paste a typical run of that at the bottom of this post.
I still suggest you use papermill to do this so you can parameterize the notebook and then save the files with the new versions, as if a report.
Alternatively, you can use other means to inject code or construct the notebook to run with the input value. A lot of the times I use a template in string from inside a script with a placeholder for the value. Then I run the script to generate the notebooks with the value in them using string.replace()
method, save the resulting strings as notebook files, and then run those notebooks using jupytext
or jupyter nbconvert
. nbformat
can be useful for building such a notebook file too. That way you can generate reports in notebook form with the results from each run.
Also, if you don't need the code your calling to be in a notebook, it is often more convenient to save it as a python script (ending in .py
) or an ipython script (ending in .ipy
). (The latter allows you to use IPython magics in a script and is often an easier way to develop when you are used to Jupyter. However, the resulting script runs much slower then pure Python and so I usually end up converting to pure Python and only use the .ipy
form early in development.) For example, the contents of the one cell in my example add3.ipynb
could simply have been a script add3.py
saved. And then from in a notebook I can run it like the following (leaving out multiprocessing for sake of simplicity):
input_list= [1, 131, 312, 327, 348, 485, 469, 1218, 1329, 11212]
for i in input_list:
%run -i add3.py
Note the use of the -i
option with %run to "run the file in IPython’s namespace instead of an empty one." Note that option isn't necessary when using %run
to run another notebook, because as by default, it's as if you are running the other notebook in the calling the notebook. I like the greater flexibility using %run
in conjunction with a script because often I don't want the script running in the same namespace. The alternatives I mentioned (papermill, jupytext, &jupyter nbconvert) to execute an external notebook separate from the current namepsace.
Result seen when running the minimal working example:
input is 1 input is 131 input is 312 input is 327 input is 348 input is 485 input is 469 input is 1218 input is 11212 input is 1329 where the input is 131; the output is 134 where the input is 1; the output is 4 where the input is 312; the output is 315 where the input is 327; the output is 330 where the input is 485; the output is 488 where the input is 1218; the output is 1221 where the input is 469; the output is 472 where the input is 348; the output is 351 where the input is 1329; the output is 1332 where the input is 11212; the output is 11215
Answered By - Wayne
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.