Issue
I have the following .html:
<li class="print text">
<span><em class="time">
<div class="time">1.29 s</div>
</em><em class="status">passed</em>This is the text I want to get</span>
I need to get only the text that is outside all of the other tags (text is: This is the text I want to get).
I was trying to use this piece of code:
for el in doc.find_all('li', attrs={'class': 'print text'}):
print(el.get_text())
But unfortunatelly it prints everything including the em tags etc.
Is there any way to do this?
Thank you!!
Solution
You could go with find(text=True, recursive=False)
to get your goal.
Example
from bs4 import BeautifulSoup
soup='''<li class="print text">
<span><em class="time">
<div class="time">1.29 s</div>
</em><em class="status">passed</em>This is the text I want to get</span>'''
soup=BeautifulSoup(soup)
soup.find('li',class_='print text').span.find(text=True, recursive=False)
Output
This is the text I want to get
If there are multiple span
in your li
you could go with:
from bs4 import BeautifulSoup
soup='''<li class="print text">
<span><em class="time">
<div class="time">1.29 s</div>
</em><em class="status">passed</em>This is the text I want to get</span>
<span><em class="time">
<div class="time">1.50 s</div>
</em><em class="status">passed</em>This is the text I want to get too</span>'''
soup=BeautifulSoup(soup)
for e in soup.select('li.print.text span'):
print(e.find(text=True, recursive=False))
Output
This is the text I want to get
This is the text I want to get too
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.