Issue
I've been searching the scrapy documentation for a way to limit the number of requests my spiders are allowed to make. During development I don't want to sit here and wait for my spiders to finish an entire crawl, even though the crawls are pretty focused they can still take quite awhile.
I want the ability to say, "After x requests to the site I'm scraping stop generating new requests."
I was wondering if there is a setting for this I may have missed or some other way to do it using the framework before I try to come up with my own solution.
I was considering implementing a downloader middleware that would keep track of the number of requests being processed and stop passing them to the downloader once a limit has been reached. But like I said I'd rather use a mechanism already in the framework if possible.
Any thoughts? Thank you.
Solution
You are looking for the CLOSESPIDER_PAGECOUNT
setting of the CloseSpider
extension:
An integer which specifies the maximum number of responses to crawl. If the spider crawls more than that, the spider will be closed with the reason
closespider_pagecount
. If zero (or non set), spiders won’t be closed by number of crawled responses.
Answered By - alecxe
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.