Issue
While its a very common question at first, I have tried many different approach to scrap all the text recursively from the following html code, but for some reason none of them worked:
<span class="coupon__logo coupon__logo--for-shops">
<span class="amount"><b>20</b>%</span>
<span class="type">Cupom</span>
</span>
What I tried :
p.css('span.coupon__logo coupon__logo--for-shops *::text').get()
p.css('span.amount ::text').get()
p.css('span.amount *::text').get()
And even a xpath one:
p.xpath('//span[@class="coupon__logo coupon__logo--for-shops"]//text()').get()
p.xpath('//span[@class="amount"]//text()').get()
The best thing I got was p.css('span.amount *::text').getall()
, but it will extract the text from all of the concurrences, what requires me to create a code to organize them individually, while is way better if i could get only the text of the current instance, especially because I'm looping trough many of them, and because it would be vulnerable to any changes from the website .
Solution
instead of getting all the text of all the children of <span class="coupon__logo coupon__logo--for-shops">
you can get the text of specific children.
CSS:
scrapy shell file:///path/to/file.html
In [1]: ' '.join(response.css('span.coupon__logo.coupon__logo--for-shops span *::text').getall())
Out[1]: '20 % Cupom'
xpath:
scrapy shell file:///path/to/file.html
In [1]: ' '.join(response.xpath('//span[@class="coupon__logo coupon__logo--for-shops"]/span//text()').getall())
Out[1]: '20 % Cupom'
If you have more span
tags and you only want amount
and type
you can use this:
CSS:
scrapy shell file:///path/to/file.html
In [1]: ' '.join(response.css('span.coupon__logo.coupon__logo--for-shops span.amount *::text, span.type::text').getall())
Out[1]: '20 % Cupom'
xpath:
scrapy shell file:///path/to/file.html
In [1]: ' '.join(response.xpath('//span[@class="coupon__logo coupon__logo--for-shops"]/span[@class="amount" or @class="type"]//text()').getall())
Out[1]: '20 % Cupom'
Answered By - SuperUser
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.