Issue
I recently ran a spider in my project but I feel like scrapy it is waiting until one page is finished to move on the other one. if I am correct in scrapy's nature it moves to another page until the previous one's response is received. in this page after scrolling down I saw async def
used which means that method was explicitly made asynchronous by adding that. and If I don't put async-await
in my spiders, won't they become asynchronous. do they wait until a response is received? please let me know If I have any misconceptions and Thank you in advance.
Solution
Scrapy is asynchronous by default.
Using coroutine syntax, introduced in Scrapy 2.0, simply allows for a simpler syntax when using Twisted Deferreds, which are not needed in most use cases, as Scrapy makes its usage transparent whenever possible.
The only reason why your spiders may seem synchronous would be because you only yield a new Request
object from the callback of a previous request. If you yield multiple requests from start_requests
, or have multiple URLs in start_urls
, those will be handled asynchronously, according to your concurrency settings (Scrapy’s default is 8 concurrent requests per domain, 16 total).
Answered By - Gallaecio
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.