Issue
I am a new with beautifulsoup, I usually do web scrapping with scrapy which uses response.xpath
to get the text.
This time, I want to get the article news from a class called article-title
and the pubslished date from a class called meta-posted
The html is look like this:
<div class="col-12 col-md-8">
<article class="article-main">
<header class="article-header">
<h1 class="article-title" style="font-size: 28px !important; font-family: sans-serif !important;">Presentation: Govt pushes CCS/CCUS development in RI upstream sector</h1>
<div class="article-meta">
<span class="meta-posted">
Monday, August 1 2022 - 04:27PM WIB </span>
</div>
To get the title, what I have tried is:
title= res.findAll('h1', attrs={'class':'article-title'})
But it still gives me:
[<h1 class="article-title" style="font-size: 28px !important; font-family: sans-serif !important;">Pertagas, Chandra Asri sign gas MoU</h1>]
while to get the date:
date = res.findAll('span', attrs={'class':'meta-posted'})
But it gives me:
[<span class="meta-posted" style="font-size: large">
</span>,
<span class="meta-posted" style="font-style: italic">
</span>,
<span class="meta-posted">
Tuesday, August 2 2022 - 10:53AM WIB
</span>]
How should I write the code in order to get only the title and also the date?
Solution
This should fix your problem.
soup = BeautifulSoup(html_doc, 'html.parser')
titles= soup.findAll('h1', attrs={'class':'article-title'})
for title in titles:
print(title.get_text())
dates = soup.findAll('span', attrs={'class':'meta-posted'})
for date in dates:
print(date.get_text())
Answered By - msvstl
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.