Issue
I struggling with usage of next_sibling
(and similarly with next_element
). If used as attributes I don't get anything back but if used as find_next_sibling
(or find_next
) then it works.
From the doc:
find_next_sibling
: "Iterate over the rest of an element’s siblings in the tree. [...] returns the first one (of the match)"find_next
: "These methods use .next_elements to iterate over [...] and returns the first one"
So, find_next_sibling
depends on next_siblings
. On what does next_sibling
depends on and why do they return nothing?
from bs4 import BeautifulSoup
html = """
<div class="......>
<div class="one-ad-desc">
<div class="one-ad-title">
<a class="one-ad-link" href="www this is the URL!">
<h5>
Text needed
</h5>
</a>
</div>
<div class="one-ad-desc">
...and some more needed text here!
</div>
</div>
</div>
"""
soup = BeautifulSoup(html, 'lxml')
for div in soup.find_all('div', class_="one-ad-title"):
print('-> ', div.next_element)
print('-> ', div.next_sibling)
print('-> ', div.find_next_sibling())-> ')
break
Output
->
->
-> <div class="one-ad-desc">
...and some more needed text here!
</div>
Solution
The main point here in my opinion is that .find_next_sibling()
scope is on next level on the tree.
While .next_element
and .next_sibling
scope is on the same level of the parse tree.
So take a look and print the name of the elements and you will see next element is not a tag, cause there is nothing on same level of the tree :
for div in soup.find_all('div', class_="one-ad-title"):
print('-> ', div.next_element.name)
print('-> ', div.next_sibling.name)
print('-> ', div.find_next_sibling().name)
#output
-> None
-> None
-> div
So if you change your input to one line and no spaces,... between tags you got the following result:
from bs4 import BeautifulSoup
html = """
<div class="......><div class="one-ad-desc"><div class="one-ad-title"><a class="one-ad-link" href="www this is the URL!"><h5>Text needed</h5></a></div><div class="one-ad-desc">...and some more needed text here!</div></div></div>"""
soup = BeautifulSoup(html, 'lxml')
for div in soup.find_all('div', class_="one-ad-title"):
print('-> ', div.next_element)
print('-> ', div.next_sibling)
print('-> ', div.find_next_sibling())
Output:
-> <a class="one-ad-link" href="www this is the URL!"><h5>Text needed</h5></a>
-> <div class="one-ad-desc">...and some more needed text here!</div>
-> <div class="one-ad-desc">...and some more needed text here!</div>
Note "text needed" is not in a sibling of your selected tag, it is in one of its children. To select "text needed" -> print('-> ', div.find_next().text)
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.