Issue
given an html code lets say:
<div class="class1">
<span class="class2">some text</span>
<span class="class3">some text</span>
<span class="class4">some text</span>
</div>
How can I retrieve all the class names? ie: ['class1','class2','class3','class4']
I tried:
soup.find_all(class_=True)
But it retrieves the whole tag and i then need to do some regex on the string
Solution
You can treat each Tag
instance found as a dictionary when it comes to retrieving attributes. Note that class
attribute value would be a list since class
is a special "multi-valued" attribute:
classes = []
for element in soup.find_all(class_=True):
classes.extend(element["class"])
Or:
classes = [value
for element in soup.find_all(class_=True)
for value in element["class"]]
Demo:
from bs4 import BeautifulSoup
data = """
<div class="class1">
<span class="class2">some text</span>
<span class="class3">some text</span>
<span class="class4">some text</span>
</div>
"""
soup = BeautifulSoup(data, "html.parser")
classes = [value
for element in soup.find_all(class_=True)
for value in element["class"]]
print(classes)
# Returns
# ['class1', 'class2', 'class3', 'class4']
Answered By - alecxe
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.