Issue
I want to scrape articles from a website and apart from the title and url I want to have a date when the article was published.
Part of html I want to get data from looks like this:
'''<time datetime="2023-10-09T16:46:47+00:00">
Yesterday, 18:46
</time>'''
From this I need 2023-10-09T16:46:47
the code I have is:
for article in articles:
title = article.find("a", class_ = "newslink").text.strip()
article_url = urljoin(url,article.find("a")["href"])
date = article.select("time[datetime]")
but with this I get:
[<time datetime="2023-10-06T06:27:49+00:00">
Friday, 08:27
</time>]
Solution
datetime
is the attribute of time
element, and it can be accessed like this:
desired_value = article.select_one("time").get('datetime')
or
desired_value = article.select_one("time")['datetime']
See BeautifulSoup documentation for more.
Answered By - Barry the Platipus
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.