Issue
Sometimes I get the following message:
in process_item item['external_link_rel'] = dict_["rel"]
KeyError: 'rel'
It must be because it doesn't exist. I tried to manage but failed.
from lxml import etreeclass CleanItem():
def process_item(self, item, spider): try: root = etree.fromstring(str(item['external_link_body']).split("'")[1]) dict_ = {} dict_.update(root.attrib) dict_.update({'text': root.text}) item['external_link_rel'] = dict_["rel"] return item except KeyError as EmptyVar: if str(EmptyVar) == 'rel': dict_["rel"] = "null" item['external_link_rel'] = dict_["rel"] return item
Most likely, all problems are due to this line if str(EmptyVar) == 'rel'
.
Thank you for guiding me so that an operation is performed only when this error occurs.
Before asking the question, I did a lot of research and did not come to a conclusion
Just for information, the above codes are in the pipelines.py file inside the Scrapy framework
Solution
A better way to do it is to use the dictionary attribute get
. You can read on it here
from lxml import etree
class CleanItem():
def process_item(self, item, spider):
root = etree.fromstring(str(item['external_link_body']).split("'")[1])
dict_ = {}
dict_.update(root.attrib)
dict_.update({'text': root.text})
item['external_link_rel'] = dict_.get("rel", "null")
return item
Answered By - Alvin Shaita
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.