Issue
I'm trying to create a metadata scraper to enrich my e-book collection, but am experiencing some problems. I want to create a dict
(or whatever gets the job done) to store the index (only while testing), the path and the series name. This is the code I've written so far:
from bs4 import BeautifulSoup
def get_opf_path():
opffile=variables.items
pathdict={'index':[],'path':[],'series':[]}
safe=[]
x=0
for f in opffile:
x+=1
pathdict['path']=f
pathdict['index']=x
with open(f, 'r') as fi:
soup=BeautifulSoup(fi, 'lxml')
for meta in soup.find_all('meta'):
if meta.get('name')=='calibre:series':
pathdict['series']=meta.get('content')
safe.append(pathdict)
print(pathdict)
print(safe)
this code is able to go through all the opf files and get the series, index and path, I'm sure of this, since the console output is this:
However, when I try to store the pathdict
to the safe
, no matter where I put the safe.append(pathdict)
the output is either:
or
or
What do I have to do, so that the safe=[]
has the data shown in image 1?
I have tried everything I could think of, but nothing worked.
Any help is appreciated.
Solution
I believe this is the correct way:
from bs4 import BeautifulSoup
def get_opf_path():
opffile = variables.items
pathdict = {'index':[], 'path':[], 'series':[]}
safe = []
x = 0
for f in opffile:
x += 1
pathdict['path'] = f
pathdict['index'] = x
with open(f, 'r') as fi:
soup = BeautifulSoup(fi, 'lxml')
for meta in soup.find_all('meta'):
if meta.get('name') == 'calibre:series':
pathdict['series'] = meta.get('content')
print(pathdict)
safe.append(pathdict.copy())
print(safe)
For two main reasons:
When you do:
pathdict['series'] = meta.get('content')
you are overwriting the last value in
pathdict['series']
so I believe this is where you should save.You also need to make a copy of it, if you don´t it will change also in the list. When you store the dict you really are storing a reeference to it (in this case, a reference to the variable
pathdict
.
Note
If you want to print the elements of the list in separated lines you can do something like this:
print(*save, sep="\n")
Answered By - Jorge Morgado
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.