Issue
I am using scrapy for web scraping, but getting data loss warning for few requests, each time I run the same spider, it gives me these data loss error on different urls, so I believe that it just needs to retry for these requests, does anyone know, how can I do that? I am getting following warning few times:
[scrapy.core.downloader.handlers.http11] WARNING: Got data loss in <failed link> If you want to process broken responses set the setting DOWNLOAD_FAIL_ON_DATALOSS = False -- This message won't be shown in further requests
Solution
Just as the error message says you will need to configure Scrapy to handle failed downloads. The reference for configuring Scrapy is a great resource to do so depending on how you decide to run or configure your program.
https://docs.scrapy.org/en/latest/topics/settings.html
As long as the servers are not misconfigured and these are temporary issues you can set the RETRY_ENABLED
flag to True
and the DOWNLOAD_FAIL_ON_DATALOSS
flag to False in order to retry failed scrapes.
Answered By - Poiuy
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.