Issue
I have this html code:
<ul id="main-menu">
<li>
<a href="/ru/products/pro_1" title="text_1">...</a>
<ul class="sm-nowrap">...</ul>
</li>
<li>
<a href="/ru/products/pro_2" title="text_2">...</a>
<ul class="sm-nowrap">...</ul>
</li>
<li>
<a href="/ru/products/pro_3" title="text_3">...</a>
<ul class="sm-nowrap">...</ul>
</li>
<li>
<a href="/ru/products/pro_4" title="text_4">...</a>
<ul class="sm-nowrap">...</ul>
</li>
</ul>
I need to collect all the links (which are above the <ul class="sm-nowrap">
tag). I'm trying to do this with the following loop:
for i in response.css('ul#main-menu li'):
link = i.xpath('//ul[class="sm-nowrap"]/preceding::a[1]/@href').get()
but I only get None None None None....
where is my mistake, what is wrong?
Solution
!! I don't have your whole HTML code so considering the part you gave us: !!
from io import StringIO
from lxml import etree
f = StringIO('''\
<ul id="main-menu">
<li>
<a href="/ru/products/pro_1" title="text_1">...</a>
<ul class="sm-nowrap">...</ul>
</li>
<li>
<a href="/ru/products/pro_2" title="text_2">...</a>
<ul class="sm-nowrap">...</ul>
</li>
<li>
<a href="/ru/products/pro_3" title="text_3">...</a>
<ul class="sm-nowrap">...</ul>
</li>
<li>
<a href="/ru/products/pro_4" title="text_4">...</a>
<ul class="sm-nowrap">...</ul>
</li>
</ul>
''')
tree = etree.parse(f)
The easiest and fastest way:
links = [i for i in tree.xpath('//a/@href')]
Explanation:
//
everything from root to first a
tag(s) (same level) encountered, then get the href
attribute .
Result:
['/ru/products/pro_1', '/ru/products/pro_2', '/ru/products/pro_3', '/ru/products/pro_4']
Answered By - Drakax
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.