Issue
I want to ignore one class when using find_all. I've followed this solution Select all divs except ones with certain classes in BeautifulSoup
My divs are a bit different, I want to ignore description-0
<div class="abc">...</div>
<div class="parent">
<div class="description-0"></div>
<div class="description-1"></div>
<div class="description-2"></div>
</div>
<div class="xyz">...</div>
Following is my code
classToIgnore = ["description-0"]
all = soup.find_all('div', class_=lambda x: x not in classToIgnore)
It is reading all divs on the page, instead of just the ones with "descriptions-n". How to fix it?
Solution
Use regex
, like this, for example:
import re
from bs4 import BeautifulSoup
sample_html = """<div class="abc">...</div>
<div class="description-0"></div>
<div class="description-1"></div>
<div class="description-2"></div>
<div class="xyz">...</div>"""
classes_regex = (
BeautifulSoup(sample_html, "lxml")
.find_all("div", {"class": (re.compile(r"description-[1-9]"))})
)
print(classes_regex)
Output:
[<div class="description-1"></div>, <div class="description-2"></div>]
Answered By - baduker
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.