Issue
I have some HTML that contains a pre
tag:
<p>Hi!</p><pre><p>Hi!</p></pre>
I'd like to change it to:
<p>Hi!</p><pre><p>Bye!</p></pre>
The naïve thing to do seems to be:
from bs4 import BeautifulSoup
markup = """<p>Hi!</p><pre><p>Hi!</p></pre>"""
soup = BeautifulSoup(markup, "html.parser")
pre_tag = soup.pre
pre_tag.string = "<p>bye!</p>"
print(str(soup))
but that gives <p>Hi!</p><pre><p>bye!</p></pre>
In the BS4 docs there's a section on output formatters that gives an example of using cdata:
from bs4.element import CData
soup = BeautifulSoup("<a></a>", 'html.parser')
soup.a.string = CData("one < three")
print(soup.a.prettify(formatter="html"))
# <a>
# <![CDATA[one < three]]>
# </a>
Which looks like what's needed, except that it also wraps the unformatted characters in a cdata tag; not good inside a pre
.
This question: Beautiful Soup replaces < with < looks like it's going in this vague direction, but isn't about the insides of a pre
tag.
This question: customize BeautifulSoup's prettify by tag seems like overkill, and is also from the BS3 era.
p.s. the example above is indicative of wanting to do all kinds of things to the contents of a pre, not just change hi to bye. (before anyone asks)
Solution
Either you can use the API to construct the new contents:
from bs4 import BeautifulSoup
markup = """<p>Hi!</p><pre><p>Hi!</p></pre>"""
soup = BeautifulSoup(markup, "html.parser")
pre_tag = soup.pre
new_tag = soup.new_tag("p")
new_tag.append("bye!")
pre_tag.clear()
pre_tag.append(new_tag)
print(str(soup))
Or you can provide the HTML to another BeautifulSoup instance and use that:
from bs4 import BeautifulSoup
markup = """<p>Hi!</p><pre><p>Hi!</p></pre>"""
soup = BeautifulSoup(markup, "html.parser")
pre_tag = soup.pre
soup2 = BeautifulSoup("<p>bye!</p>", "html.parser")
pre_tag.clear()
pre_tag.append(soup2)
print(str(soup))
Answered By - snwflk
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.