Issue
I am scraping amazon reviews and they give an unique identifier to each review which I would like to scrape. However the identifier is never displayed as text but just exists in the following form:
<div id="R2XLFP626GRWEM" data-hook="review" class="a-section review aok-relative">
I want "R2XLFP626GRWEM" to be returned.
When using
response.xpath('.//div[@data-hook="review"]').extract()
I get the whole content of the div tag, which is quite a lot, considering that the whole review is embedded in it.
Content I need:
Solution
You can get the id values by using CSS selectors instead of xpath like below.
response.css('.a-section .review::attr(id)').extract()
or by using xpath
response.xpath('//*[@class="a-section review aok-relative"]/@id').extract()
or by modifying original xpath query
response.xpath('.//div[@data-hook="review"]/@id').extract()
Answered By - caki
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.