Issue
I have such an HTML document and I want to get the content inside the section
<body>
<section class="post-content">
<h1>title</h1>
<div>balabala</div>
</section>
<body>
When I use the following code
soup.find_all("section", {"class": "post-content"})
I get
<section class="post-content">
<h1>title</h1>
<div>balabala</div>
</section>
But what I want is what is inside the section, what should I do?
Solution
You can use the .findChildren()
method and a list comphrension:
import bs4
soup = bs4.BeautifulSoup("""
<body>
<section class="post-content">
<h1>title</h1>
<div>part one</div>
</section>
<section class="post-content">
<h1>title2</h1>
<div>part two</div>
</section>
<body>
""", 'html.parser')
els = soup.find_all("section", {"class": "post-content"})
els = [list(el.findChildren()) for el in els]
print(els) # => [[<h1>title</h1>, <div>part one</div>], [<h1>title2</h1>, <div>part two</div>]]
The soup.find_all()
call returns a list of elements and the list comprehension loops over every element and splitting it into a list of its children. el.findChildren()
returns an iterator, so it need to be collected into a list with list()
.
Answered By - Michael M.
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.