Issue
I have created a scrapy spider which do pagination. Using the same script with different link from the same website and pagination was stopped by "Filtered offsite request". Turning on the feature "dont_filter" in scrapy Request runs into infity loop over the page. Wondering how a script could provide different results without any changes?
Solution
You should provide your code so we could be more helpful.
Make sure you have only the domain in the field allowed_domains
of your spider. For example:
class MySpider(scrapy.Spider):
name = 'example'
allowed_domains = ['example.com'] # Don't use 'https://example.com/some/path/here'
start_urls = ['https://example.com/some/path/here']
Obviously, the domain in allowed_domains
must match the domains you are creating requests to.
You can also remove this attribute entirely. More details on allowed_domains
here.
Answered By - renatodvc
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.