Issue
I'm trying to scrape text with beautiful soup and I need to get text from inside a span with a specific class but discard the superscript numbers inside the same span with a different class. I can very easily use get_text to pull the number and the contents from the span but I end up with the superscript numbers as well. The solution needs to be able to discard each instance of the sup tag as well as its text contents.
Example HTML:
<span class="woj">
<sup class="versenum">
16
</sup>
The text I want
</span>
What I get right now: 16 The text I want
What I want: The text I want
Solution
You can extract all sup tags using .sup.extract()
html = '<span class="woj"><sup class="versenum">16</sup>The text I want</span>'
parsed_element = bs.BeautifulSoup(html, 'html.parser')
[s.extract() for s in parsed_element('sup')]
text = parsed_element.text
Answered By - Michael Dz
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.