Issue
I am running multiple scrapers using the command line which is an automated process.
Python : 2.7.12
Scrapy : 1.4.0
OS : Ubuntu 16.04.4 LTS
I want to know how scrapy handles the case when
- There is not enough memory/cpu bandwidth to start a scraper.
- There is not enough memory/cpu bandwidth during a scraper run.
I have gone through the documentation but couldn't find anything.
Anyone answering this, you don't have to know the right answer, if you can point me to the general direction of any resource you know which would be helpful, that would also be appreciated
Solution
The operating system kills any process that tries to access more memory than the limit. Applies to python programs too. and scrapy is no different.
More often than not, bandwidth is the bottleneck in scraping / crawling applications.
Memory would only be a bottleneck if there is a serious memory leak in your application.
Your application would just be very slow if CPU is being shared by many process on the same machine.
Answered By - Anuvrat Parashar
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.