Issue
I have two scrapy spiders in two different scripts
Spiders
Spider1.py
Spider2.py
An example of the code in the spiders is as follows:
from scrapy.crawler import CrawlerRunner
from twisted.internet import reactor
class Spider(scrapy.spider):
# some code
runner = CrawlerRunner(
settings={'FEEDS':
{'../input/next.csv': {'format':
'csv'}}})
runner.crawl(Spider)
d = runner.join()
d.addBoth(lambda _: reactor.stop())
reactor.run()
I am running both spiders from a separate script with the following code:
import runpy as r
def run_webscraper():
r.run_path(path_name='Spider1.py')
r.run_path(path_name='Spider2.py')
return
if __name__ == '__main__':
run_webscrapper()
When I try to run the spiders, Spider1 runs and saves the results in the corresponding csv file but when executes spider2 I get the following error:
twisted.internet.error.ReactorNotRestartable
Any ideas on how to fix the code so that the two spiders run and save their results in separate files (spider1.csv, spider2.csv)?
Is this actually possible?
Solution
I believe you can do this by creating a cutom setting within each spider like this:
spider1:
class Spider1(scrapy.Spider):
name='spider1'
custom_settings = {
'FEEDS': {
'spider1.csv': {
'format': 'csv'
}
}
}
spider2:
class Spider2(scrapy.Spider):
name='spider2'
custom_settings = {
'FEEDS': {
'spider2.csv': {
'format': 'csv'
}
}
}
Answered By - kunalmehta14
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.