Issue
I am trying to get the text from h2, h3 and p tag on the page in the order they appear on html page. Example: All highlighted text should be extracted in this order.
When using the following xpath:
response.xpath('//*[name()=("h2", "h3","p")]/text()').extract()
Im getting the following error:
ValueError: XPath error: Invalid expression in //*[name()=("h2", "h3","p")]/text()
Where am I wrong? Is there another way to reach my goal?
Solution
You can achieve what you want by combining a few conditionals using or
:
response.xpath('//*[name()="h2" or name()="h3" or name()="p"]/text()')
You could also select the same thing by combining a few paths and chaining multiple .xpath()
calls:
response.xpath('//h2|//h3|//p').xpath('./text()')
I'm not sure if there are any performance differences, but I'd just go with the one you find easier to read.
If performance is a big concern, I recommend profiling both ways.
Answered By - stranac
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.