Issue
A sample of HTML content:
<div class="content"> Hello World It is good to see you.<span>hi<img/></span></div>
I want to print only "Hello World It is good to see you."
(should not include hi and img) but when I try methods like .text in BeautifulSoup, it is also scraping the text from the inner tags. Can someone help me out?
Solution
Consider:
my_html = '<div class="content"> Hello World It is good to see you.<span>hi<img/></span></div>'
soup = BeautifulSoup(my_html)
div_tag = soup.find("div")
The following line will achieve it:
text_content = div_tag.find_all(text=True, recursive=False)
Hope it helps.
Answered By - Jonatan Kruszewski
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.