Issue
I'm trying to use beautiful-soup to return elements of the DOM that contain children that match filtering criteria.
In the example below,I want to return both divs
based on finding a regex match in a child element.
<body>
<div class="randomclass1">
<span class="randomclass">regexmatch1</span>
<h2>title</h2>
</div>
<div class="randomclass2">
<span class="randomclass">regexmatch2</span>
<h2>title</h2>
</div>
</body>
The basic code setup is as follows:
from bs4 import BeautifulSoup as soup
page = soup(html)
Results = page.find_all('div')
How do I add a regex test that evaluates the children of the target div? I.e, how would I add the regex call below to the 'find' or 'find_all' functions of beautiful-soup?
re.compile('regexmatch\d')
Solution
The approach I landed with was find_parent, which will return the parent element of the beautifulsoup results regardless of the method used to find the original result (regex or otherwise). For the example above:
childOfResults = page.find_all('span', string=re.compile('regexmatch\d'))
Results = childOfResult[0].find_parent()
...modified with the loop of your choice to cycle through all the members of childOfResult
Answered By - P.J
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.