Issue
I see there are several types of responses, but how do I signal Scrapy to return an HtmlResponse?
I think the goal would be to implement def parse(self, response: HtmlResponse):
. Or is this supposed to be used some other way? Is there an usag example?
This is the example from Scrapy tutorial. How would I use HtmlResponse here instead of the default?
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes"
def start_requests(self):
urls = [
'https://quotes.toscrape.com/page/1/',
'https://quotes.toscrape.com/page/2/',
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
page = response.url.split("/")[-2]
filename = f'quotes-{page}.html'
with open(filename, 'wb') as f:
f.write(response.body)
self.log(f'Saved file {filename}')
Solution
Scrapy tries to identify the type of response it gets and calls parse
with a specific type. As far as I can tell, parse is never called with the base type Response
. The Response identification is done in `scrapy/responsetypes.py by a sort of methods: mimetype, body, headers, etc.
Here's the mimetype identification map:
CLASSES = {
'text/html': 'scrapy.http.HtmlResponse',
'application/atom+xml': 'scrapy.http.XmlResponse',
'application/rdf+xml': 'scrapy.http.XmlResponse',
'application/rss+xml': 'scrapy.http.XmlResponse',
'application/xhtml+xml': 'scrapy.http.HtmlResponse',
'application/vnd.wap.xhtml+xml': 'scrapy.http.HtmlResponse',
'application/xml': 'scrapy.http.XmlResponse',
'application/json': 'scrapy.http.TextResponse',
'application/x-json': 'scrapy.http.TextResponse',
'application/json-amazonui-streaming': 'scrapy.http.TextResponse',
'application/javascript': 'scrapy.http.TextResponse',
'application/x-javascript': 'scrapy.http.TextResponse',
'text/xml': 'scrapy.http.XmlResponse',
'text/*': 'scrapy.http.TextResponse',
}
Since parse
is called with one of the subclasses, devs have access to it directly in the response
parameter. One way to use it is like this:
def parse(self, response):
if isinstance(response, HtmlResponse):
...
Answered By - Fernando César
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.