Issue
I'm working with BeautifulSoup/Python to parse an HTML page and update the content as required. A dummy structure of my HTML page structure is as follows:
<div class="main">
<div class="class_1">
<p><br/></p>
<div class="panel">Some content here </div>
<div class="panel">Another content here </div>
</div>
</div>
I would like to update the content of <div class="class_1">
.
I'm able to successfully use BeautifulSoup parser to get the contents of <div class="class_1">
. I'm also able to save the new data that I would like to have in my HTML page as list as displayed below:
['<div class="panel">Some content here </div>',
'<div class="panel">Updated new content here </div>',
'<div class="panel">Hello new div here! </div>']
How can I get the following? I tried replace_with but it replaces <
with <
which isn't desirable and I'm not too familiar with Beautiful soup so not sure what other options are available that can help me achieve the following.
<div class="main">
<div class="class_1">
<p><br/></p>
<div class="panel">Some content here </div>
<div class="panel">Updated new content here </div>
<div class="panel">Hello new div here! </div>
</div>
</div>
Solution
Try:
from bs4 import BeautifulSoup
html_doc = """
<div class="main">
<div class="class_1">
<p><br/></p>
<div class="panel">Some content here </div>
<div class="panel">Another content here </div>
</div>
</div>
"""
new_content = [
'<div class="panel">Some content here </div>',
'<div class="panel">Updated new content here </div>',
'<div class="panel">Hello new div here! </div>',
]
soup = BeautifulSoup(html_doc, "html.parser")
# locate the correct <p> element:
p = soup.select_one(".class_1 p")
# delete old content:
# tags:
for t in p.find_next_siblings():
t.extract()
# text (if any):
for t in p.find_next_siblings(text=True):
t.extract()
# place new content:
p.insert_after(BeautifulSoup("\n" + "\n".join(new_content) + "\n", "html.parser"))
print(soup)
Prints:
<div class="main">
<div class="class_1">
<p><br/></p>
<div class="panel">Some content here </div>
<div class="panel">Updated new content here </div>
<div class="panel">Hello new div here! </div>
</div>
</div>
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.