Issue
I'm having trouble with selecting text with BeautifulSoup. I am trying to get text from <span class= "data">
only, but I keep getting results with other elements as well. For example, the words I want are 'Playstation 3' and 'Game Boy Advance' in the code below, not 'PC' Could you help?
soup:
<span class="data">
PlayStation 3
</span>,
<span class="data">
Game Boy Advance
</span>,
<span class="data">
Dec 8, 2022
</span>,
<span class="data">
<a href="/game/pc">
PC
</a>
P.S. I've tried this below code:
console = soup.select('span.data')
for console in console:
print(console.get_text(strip = True))
output snippet:
PlayStation 3
Game Boy Advance
Dec 8, 2022
PC
Thanks!
Solution
This example will select all <span class="data">
which don't have any other tags inside them:
from bs4 import BeautifulSoup
html_doc = """\
<span class="data">
PlayStation 3
</span>,
<span class="data">
Game Boy Advance
</span>,
<span class="data">
Dec 8, 2022
</span>,
<span class="data">
<a href="/game/pc">
PC
</a>
"""
soup = BeautifulSoup(html_doc, "html.parser")
for span in soup.select("span.data:not(:has(*))"):
print(span.get_text(strip=True))
Prints:
PlayStation 3
Game Boy Advance
Dec 8, 2022
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.