Issue
I am trying to get all headers which are not in the footer.
So the header <h3 class="ibm-bold">Discover</h3>
should be excluded from the scrape.
<footer role="contentinfo" aria-label="IBM">
<div class="region region-footer">
<div id="ibm-footer-module">
<section role="region" aria-label="Resources">
<h3 class="ibm-bold">Discover</h3>
I have tried using this expression to select the headers which should be excluded, but it doesn't return the right nodes.
//*[self::h1 or self::h2 or self::h3 or self::h4 or self::h5 or self::h6]/ancestor::footer/text()
The page I am scraping is this: https://www.ibm.com/products/informix/embedded-for-iot?mhq=iot&mhsrc=ibmsearch_a
Please help
Solution
You almost had it.
//*[
(self::h1 or self::h2 or self::h3 or self::h4 or self::h5 or self::h6)
and not(ancestor::footer)
]/text()
Answered By - Tomalak
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.