Thursday, January 6, 2022

[FIXED] getting href value BeautifulSoup

January 06, 2022 beautifulsoup, jupyter-notebook, python No comments

Issue

i got error for getting href value and it say: "ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" but when i change to "find()" on my get href value code it say: "ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" here it is my code:

titles = []
dates = []
links = []
page = 1

while (page <= 60):
    url = requests.get(f"http://detik.com/search/searchall?query=covid&siteid=2&sortby=time&page={page}")
    soup = bs(url.text, 'lxml')
    container = soup.find_all('div', class_='container content')
    for l_media in container:
        media_cont = l_media.find_all('div', class_='list media_rows list-berita')
        for article in media_cont:
            article_cont = article.find_all('article')
            for title in article_cont:
                news_title = title.find('h2', class_='title')
                titles.append(news_title.text.strip())
            for date in article_cont:
                news_date = date.find('span', class_='date')
                dates.append(news_date.text.strip())
            for a_tag in article_cont.find('a'):
                link = a_tag['href']
                links.append(link)            
    page += 1

Solution

It is not necessary to use all these loops, take a look to an alternativ approach.

Example

from bs4 import BeautifulSoup
import requests

data = []
page = 1

url = requests.get(f"http://detik.com/search/searchall?query=covid&siteid=2&sortby=time&page={page}")
soup = BeautifulSoup(url.text, 'lxml')



while (page <= 10):
    for article in soup.select('div.list-berita article'):
        news_title = article.find('h2', class_='title').text
        news_date = article.find('span', class_='date').contents[1]
        link = article.find('a')['href']

        data.append({
            'title':news_title,
            'date':news_date,
            'link':link
        })
    page += 1
    
data

Answered By - HedgeHog

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, January 6, 2022

[FIXED] getting href value BeautifulSoup

Issue

Solution

Example

0 comments:

Post a Comment

Popular Posts

Labels