Issue
Is there a way to use BeautifulSoup to wrap an element tag around a grouping of tags?
I have a document like this:
<h1>Heading for Sec 1</h1>
<p>some text sec 1</p>
<p>some text sec 1</p>
<p>some text sec 1</p>
<h1>Heading for Sec 2</h1>
<p>some text sec 2</p>
<p>some text sec 2</p>
<p>some text sec 2</p>
<h1>Heading for Sec 3</h1>
<p>some text sec 3</p>
<p>some text sec 3</p>
I need to wrap each grouping with an tag. Each grouping begins with an tag. So the output would be:
<div>
<h1>Heading for Sec 1</h1>
<p>some text sec 1</p>
<p>some text sec 1</p>
<p>some text sec 1</p>
</div>
<div>
<h1>Heading for Sec 2</h1>
<p>some text sec 2</p>
<p>some text sec 2</p>
<p>some text sec 2</p>
</div>
<div>
<h1>Heading for Sec 3</h1>
<p>some text sec 3</p>
<p>some text sec 3</p>
</div>
Solution
You can try:
from bs4 import BeautifulSoup
html_doc = """\
<h1>Heading for Sec 1</h1>
<p>some text sec 1</p>
<p>some text sec 1</p>
<p>some text sec 1</p>
<h1>Heading for Sec 2</h1>
<p>some text sec 2</p>
<p>some text sec 2</p>
<p>some text sec 2</p>
<h1>Heading for Sec 3</h1>
<p>some text sec 3</p>
<p>some text sec 3</p>"""
soup = BeautifulSoup(html_doc, "html.parser")
last_div = None
for tag in soup.select("h1, p"): # or soup.find_all(["h1", "p"], recursive=False) if there are inner tags
if tag.name == "h1":
last_div = tag.wrap(soup.new_tag("div"))
last_div.insert(0, "\n")
last_div.append("\n")
continue
last_div.append(tag)
last_div.append("\n")
# remove unnecessary spaces:
soup.smooth()
for t in soup.find_all(text=True, recursive=False):
t.replace_with("\n")
print(soup)
Prints:
<div>
<h1>Heading for Sec 1</h1>
<p>some text sec 1</p>
<p>some text sec 1</p>
<p>some text sec 1</p>
</div>
<div>
<h1>Heading for Sec 2</h1>
<p>some text sec 2</p>
<p>some text sec 2</p>
<p>some text sec 2</p>
</div>
<div>
<h1>Heading for Sec 3</h1>
<p>some text sec 3</p>
<p>some text sec 3</p>
</div>
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.