Issue
I use Scrapy and Selenium for scraping.
when i run my spider, with htop
i see the instance of my webdriver.
I would like to know when should I close the webdriver in my code ?
def parse():
# I have all links in my array_links
for link in self.array_links:
self.driver.get(link)
# Here i Parse the products
item = MyTestItem()
item['test1'] = "test"
yield item
I add this in my code
def __del__(self):
self.driver.quit()
for the end of the script, but I don't know should I close
webdriver after having taken each link?
Thanks,
Solution
The question is. Where do you open it?
If your webdriver is in the context of spider then ideally you'd want to open it when spider opens and close it when spider closes.
You can do that by connectin open_spider
and close_spider
signals:
from scrapy import signals
from scrapy import Spider
class MySpider(Spider):
name = "spideroo"
@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
spider = super().from_crawler(crawler, *args, **kwargs)
crawler.signals.connect(spider.spider_closed, signal=signals.spider_closed)
crawler.signals.connect(spider.spider_opened, signal=signals.spider_opened)
return spider
def spider_opened(self, spider):
self.driver = selenium.WebDriver() # or what's your driver's class is.
def spider_closed(self, spider):
self.driver.close()
Answered By - Granitosaurus
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.