Thursday, September 30, 2021

[FIXED] Scrapy & xpath: Search for specific text in tree and extract text in next node

September 30, 2021 contains, scrapy, text, web-scraping, xpath No comments

Issue

Trying to scrape the weight of smartwatches from www.currys.co.uk. The website does not follow the same structure for all products so to get the weight of each product I am trying to use a keyword search using xpath:

//text()[contains(.,'Weight')]

The problem is with the code i can get the text "Weight", but i want to get is the following node that contains the actual value of the weight:

<tbody>
 <tr>
   <th scope = "row">Weight</th>
   <td> 26.7 g</td>
 <tr>
<body>

What I am looking for is to get the text 26.7 g. I tried using the below, but it doesn't seem to work:

//text()[contains(.,'Weight')]//td

Any suggestions? Thanks in advance.

Solution

You can use following-sibling::td:

from lxml import etree


txt = '''<tbody>
 <tr>
   <th scope = "row">Weight</th>
   <td> 26.7 g</td>
 </tr>
</tbody>'''

root = etree.fromstring(txt)

for td in root.xpath('//th[contains(., "Weight")]/following-sibling::td'):
    print(td.text)

Prints:

 26.7 g

Answered By - Andrej Kesely

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, September 30, 2021

[FIXED] Scrapy & xpath: Search for specific text in tree and extract text in next node

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels