Issue
i got error for getting href value and it say: "ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" but when i change to "find()" on my get href value code it say: "ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" here it is my code:
titles = []
dates = []
links = []
page = 1
while (page <= 60):
url = requests.get(f"http://detik.com/search/searchall?query=covid&siteid=2&sortby=time&page={page}")
soup = bs(url.text, 'lxml')
container = soup.find_all('div', class_='container content')
for l_media in container:
media_cont = l_media.find_all('div', class_='list media_rows list-berita')
for article in media_cont:
article_cont = article.find_all('article')
for title in article_cont:
news_title = title.find('h2', class_='title')
titles.append(news_title.text.strip())
for date in article_cont:
news_date = date.find('span', class_='date')
dates.append(news_date.text.strip())
for a_tag in article_cont.find('a'):
link = a_tag['href']
links.append(link)
page += 1
Solution
It is not necessary to use all these loops, take a look to an alternativ approach.
Example
from bs4 import BeautifulSoup
import requests
data = []
page = 1
url = requests.get(f"http://detik.com/search/searchall?query=covid&siteid=2&sortby=time&page={page}")
soup = BeautifulSoup(url.text, 'lxml')
while (page <= 10):
for article in soup.select('div.list-berita article'):
news_title = article.find('h2', class_='title').text
news_date = article.find('span', class_='date').contents[1]
link = article.find('a')['href']
data.append({
'title':news_title,
'date':news_date,
'link':link
})
page += 1
data
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.