Issue
Want to get the links within <h1>
which has span class="puzzle-type"
.
From below, link1 and link2 will be scraped, except link3.
By now I have to get all <h1>
tags, and then check if they have such a class and get the link. It will cost more time if a page has too many <h1>
tags. Is there a simpler way to do so? Thanks.
<h1>
<span class="puzzle-type" >A</span>
<a href="link1.com">link1</a>
</h1>
<h1>
<span class="puzzle-type" >B</span>
<a href="link2.com">link2</a>
</h1>
<h1>
<a href="link3.com">link3</a>
</h1>
Python:
def parse(self, response):
for h1 in response.xpath('//h1'):
if h1.xpath('.//span[@class="puzzle-type"]').extract_first():
url = h1.xpath('.//@href').extract_first()
Solution
Use xPath Axes. This is my solution:
response.xpath('//h1/span[@class="puzzle-type"]/following-sibling::a')
Answered By - Alk
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.