Issue
I am having trouble extracting a rating text inside a span class.
Already tried the following XPATH:
response.xpath("//i/span[@class='a-icon-alt']/text()").getall()
response.xpath('//span[@data-hook="rating-out-of-text"]/text()').getall()
I have the following HTML:
<div class="a-fixed-left-grid AverageCustomerReviews a-spacing-small">
<div class="a-fixed-left-grid-inner" style="padding-left:105px">
<div class="a-fixed-left-grid-col a-col-left" style="width:105px;margin-left:-105px;float:left;">
<i data-hook="average-star-rating" class="a-icon a-icon-star-medium a-star-medium-4 averageStarRating">
<span class="a-icon-alt">3,8 de 5 estrelas</span>
</i>
</div>
<div class="a-fixed-left-grid-col aok-align-center a-col-right" style="padding-left:0%;float:left;">
<div class="a-row">
<span class="a-size-base a-nowrap">
<span data-hook="rating-out-of-text" class="a-size-medium a-color-base">3,8 de 5</span>
</span>
</div>
</div>
</div>
</div>
If it helps, the HTML was extracted from this page:
Solution
I was able to grab it using this: The span isn't the immediate parent of the text, so using the //
means it will pull the text from any decendant of the element.
response.xpath('//span[@data-hook="rating-out-of-text"]//text()').getall()
Update
If you are using scrapy... a great way of finding out if the scrapy response is different than what you observe in your webbrowser is using the open_in_browser
function. Then you can actually see what page looks like from your spiders point of view.
for example:
import scrapy
from scrapy.utils.response import open_in_browser
class MySpider(scrapy.Spider):
...
...
start_urls = [...]
def parse(self, response):
open_in_browser(response)
...
Answered By - Alexander
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.