Issue
This is a generalization to this question: Way to extract pickles coming in and out of ipython / jupyter notebook
At the highest level, I'm looking for a way to automatically summarize what goes on in an ipython notebook. One way I see of simplifying the problem is treat all the data manipulations that on inside the notebook as a blackbox, and only focus on what its inputs and outputs are. So, is there a way given the filepaths to an ipython notebook how can you easily determine all the different files/websites it reads into memory and subsequently also all the files that it later writes/dumps? I'm thinking maybe there could be a function that scans the file, parses it for inputs and outputs, and saves it into a dictionary for easy access:
summary_dict = summerize_file_io(ipynb_filepath)
print summary_dict["inputs"]
> ["../Resources/Data/company_orders.csv", "http://special_company.com/company_financials.csv" ]
print summary_dict["outputs"]
> ["orders_histogram.jpg","data_consolidated.pickle"]
I'm wondering how to do this easily beyond just pickle objects to include different formats like: txt, csv, jpg, png, etc... and also which may involve reading data directly from the web into the notebook itself.
Solution
You can check what files you have opened or modified by patching the builtin open
as JRG suggested and you should extend this functionality to patch any functions you use to connect to websites if you want to track that as well.
import builtins
modified = {}
old_open = builtins.open
def new_open(name, mode='r', *args, **kwargs):
modified[name] = mode
return old_open(name, mode=mode, *args, **kwargs)
# patch builtin open
builtins.open = new_open
# check modified
def whats_modified():
print('Session has opened/modified the following files:')
for name in sorted(modified):
mode = modified[name]
print(mode.ljust(8) + name)
It we execute this in the interpreter (or use it as a module), we can see what we've modified and how we opened it.
In [4]: with open('ex.txt') as file:
...: print('ex.txt:', file.read())
...:
ex.txt: some text.
In [5]: with open('other.txt', 'w') as file:
...: file.write('Other text.\n')
...:
In [6]: whats_modified()
Session has opened/modified the following files:
r ex.txt
w other.txt
This is somewhat limited though, as the mode will be overwritten when a file is reopened, but that can be fixed with some extra checks performed in new_open
.
Answered By - Tankobot
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.