Issue
I'm running Scrapy 0.24.4, and have encountered quite a few sites that shut down the crawl very quickly, typically within 5 requests. The sites return 403 or 503 for every request, and Scrapy gives up. I'm running through a pool of 100 proxies, with the RotateUserAgentMiddleware
enabled.
Does anybody know how a site could identify Scrapy that quickly, even with the proxies and user agents changing? Scrapy doesn't add anything to the request headers that gives it away, does it?
Solution
It appears that the primary problem was not having cookies enabled. Having enabled cookies, I'm having more success now. Thanks.
Answered By - LandonC
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.