Issue
Trying to scrape the weight of smartwatches from www.currys.co.uk. The website does not follow the same structure for all products so to get the weight of each product I am trying to use a keyword search using xpath
:
//text()[contains(.,'Weight')]
The problem is with the code i can get the text "Weight", but i want to get is the following node
that contains
the actual value of the weight:
<tbody>
<tr>
<th scope = "row">Weight</th>
<td> 26.7 g</td>
<tr>
<body>
What I am looking for is to get the text 26.7 g
. I tried using the below, but it doesn't seem to work:
//text()[contains(.,'Weight')]//td
Any suggestions? Thanks in advance.
Solution
You can use following-sibling::td
:
from lxml import etree
txt = '''<tbody>
<tr>
<th scope = "row">Weight</th>
<td> 26.7 g</td>
</tr>
</tbody>'''
root = etree.fromstring(txt)
for td in root.xpath('//th[contains(., "Weight")]/following-sibling::td'):
print(td.text)
Prints:
26.7 g
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.