Issue
I am trying to get a list of all html tags from beautiful soup.
I see find all but I have to know the name of the tag before I search.
If there is text like
html = """<div>something</div>
<div>something else</div>
<div class='magical'>hi there</div>
<p>ok</p>"""
How would I get a list like
list_of_tags = ["<div>", "<div>", "<div class='magical'>", "<p>"]
I know how to do this with regex, but am trying to learn BS4
Solution
You don't have to specify any arguments to find_all()
- in this case, BeautifulSoup
would find you every tag in the tree, recursively.
Sample:
from bs4 import BeautifulSoup
html = """<div>something</div>
<div>something else</div>
<div class='magical'>hi there</div>
<p>ok</p>
"""
soup = BeautifulSoup(html, "html.parser")
print([tag.name for tag in soup.find_all()])
# ['div', 'div', 'div', 'p']
print([str(tag) for tag in soup.find_all()])
# ['<div>something</div>', '<div>something else</div>', '<div class="magical">hi there</div>', '<p>ok</p>']
Answered By - alecxe
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.