Issue
when I use scrapy item_loader to fill item, the selector extract data is None. Then, I use mysql to save data, but it gives me a mistake: key Error, the reason is this instance no this key. after i researched, there is no solution. Could you help me? Thank you very much!
real_time_hot_loader = WeiBoRealTimeHotLoader(item=WeiBORealTimeHotItem(),selector=real_time_hot_node)
real_time_hot_loader.add_css('search_rank', 'tr[action-type*="hover"] td.td_01 span em::text')
real_time_hot_loader.add_css('star_name', 'td.td_02 p.star_name a::text')
real_time_hot_loader.add_css('star_url', 'td.td_02 p.star_name a::attr(href)')
real_time_hot_loader.add_css('star_num', 'td.td_03 p.star_num span::text')
real_time_hot_loader.add_css('hot_txt','td.td_02 p.star_name i.icon_txt::text')
real_time_hot_loader.add_value('update_time', time.strftime('%Y-%m-%d %H:%M:%S', time.localtime()))
real_time_hot_loader.add_value('id', real_time_hot_date_time_id)
real_time_hot_item = real_time_hot_loader.load_item()
Solution
By default scrapy's ItemLoader discards any fields that are None
To fix this you need to make sure the loader fallsback to some other value like empty string: ""
from scrapy.loader import ItemLoader
from scrapy.loader.processors import Compose
def or_empty_string(value):
return value or ''
class MyLoader(ItemLoader):
default_output_processor = Compose(or_empty_string)
Now you can see this in action:
>>> l = MyLoader()
>>> l.add_value('foo', None)
>>> l.load_item()
{}
>>> l.add_value('foo', '')
>>> l.load_item()
{'foo': ['']}
Answered By - Granitosaurus
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.