Issue
So I have the following HTML:
<span title="总播放数236819" class="view">23.7万播放 · </span>
and I just want the '236819' from this.
I have the BeautifulSoup object created and the code:
views = soup.findAll('span', class_ = 'view')
How do I add to this/remove from this to get the bit that I'm after?
Thank you!
Solution
You can use for example re
module to extract only digits from "title"
attibute:
import re
from bs4 import BeautifulSoup
html_doc = (
"""<span title="总播放数236819" class="view">23.7万播放 · </span>"""
)
soup = BeautifulSoup(html_doc, "html.parser")
views = soup.findAll("span", class_="view")
for view in views:
print("".join(re.findall(r"\d+", view["title"]))) # <-- find only digits in "title" attribute
Prints:
236819
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.