Issue
I'm facing a bit of problem; I've worked a scrapy project long time ago and I wanted to relaunch it. I have differents spiders and I regrouped them in one main script that launch them all. Fun fact it was working at that time but now I have an Attribute error (AttributeError: 'CrawlerRunner' object has no attribute 'spiders'). Here is my code :
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.utils.project import get_project_settings
from twisted.internet import reactor
from LesInfosDuJour.pipelines import LesinfosdujourPipeline
configure_logging()
settings = get_project_settings()
runner = CrawlerRunner(settings)
for spider_name in runner.spiders.list():
runner.crawl(spider_name)
d = runner.join()
d.addBoth(lambda _: reactor.stop())
reactor.run()
obj = LesinfosdujourPipeline()
obj.empty_procedure()
obj.duplicate_procedure()
I'm launching it with python (python news-scrap.py)
setting.py
SPIDER_MODULES = ['LesInfosDuJour.spiders']
NEWSPIDER_MODULE = 'LesInfosDuJour.spiders'
Project directory : Scrap/LesInfosDuJour/spiders
Solution
After some research, scrapy on the latest update removed the CrawlerRunner.spiders method and replaced it with CrawlerRunner.spider_loader
Here is the changements I made :
configure_logging()
settings = get_project_settings()
runner = CrawlerRunner(settings)
spiders_list = list(runner.spider_loader.list())
for spider_name in spiders_list:
runner.crawl(spider_name)
d = runner.join()
d.addBoth(lambda _: reactor.stop())
reactor.run()
Answered By - Ayoub
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.