Issue
I know that beautiful soup has a function to match classes based on regex that contains certain strings, based on a post here. Below is a code example from that post:
regex = re.compile('.*listing-col-.*')
for EachPart in soup.find_all("div", {"class" : regex}):
print EachPart.get_text()
Now, is it possible to do the opposite? Basically, find classes that do not contain a certain regex. In SQL language, it's like:
where class not like '%test%'
Thanks in advance!
Solution
This actually can be done by using Negative Lookahead
Negative Lookahead has the following syntax (?!«pattern»)
and matches if pattern
does not match what comes before the current location in the input string.
In your case, you could use the following regex to match all classes that don’t contain listing-col-
in their name:
regex = re.compile('^((?!listing-col-).)*$')
Here’s the pretty simple and straightforward explanation of this regex ^((?!listing-col-).)*$
:
^
asserts position at start of a line- Capturing Group
((?!listing-col-).)*
*
matches the previous token between zero and unlimited times, as many times as possible, giving back as needed- Negative Lookahead
(?!listing-col-)
. Assert that the Regex below does not match.listing-col-
matches the characterslisting-col-
literally (case sensitive) .
matches any character
$
asserts position at the end of a line
Also, you may find the https://regex101.com site useful
It will help you test your patterns and show you a detailed explanation of each step. It's your best friend in writing regular expressions.
Answered By - andylvua
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.