Friday, April 8, 2022

[FIXED] Why do I get output from one but not the other?

April 08, 2022 beautifulsoup, python, web-scraping No comments

Issue

from bs4 import BeautifulSoup
import requests
import re


def getHTMLdocument(url):
    response = requests.get(url)
    return response.text

def correct_url(url1):
    if not url1.startswith('https://www.parliament.gov.sg'):
        url1 = f'https://www.parliament.gov.sg{url1}'
    return url1

url_to_scrape = 'https://www.parliament.gov.sg/mps/list-of-current-mps'
links = []

while True:

    html_document = getHTMLdocument(url_to_scrape)
    soup = BeautifulSoup(html_document, 'lxml')

    if soup.find_all('a', attrs={'href': re.compile("/details/")}) == []:
        break

    for link in soup.find_all('a', attrs={'href': re.compile("/details/")}):
        if link.get('href') not in links:
            links.append(correct_url(link.get('href')))

    for link in links:
        url = link
        member_info = 'mp-designation-wrap'
        **member_info = 'mp-constituency-wrap'**
        page = requests.get(url)
        soup = BeautifulSoup(page.text, 'lxml')
        txt1 = soup.find('div', attrs={'class': member_info})
        textoutput = txt1.text
        print(textoutput)

    break

I'm trying to separate the different categories to use save separately, however, I only get output when using the member_info = 'mp-designation-wrap' and I get a AttributeError: 'NoneType' object has no attribute 'text' when using 'mp-constituency-wrap'. I do not understand why it is giving me different results and it would be great if someone could help me understand why it is so and point me in the right direction

Solution

Reason why you get this error is, that the element you try to select do not exist in some of your resources, so you have to check that before calling .text.

for link in links:
    page = requests.get(link)
    soup = BeautifulSoup(page.text, 'lxml')
    text1 = e.text if (e := soup.find('div', attrs={'class': 'mp-designation-wrap'})) else None
    text2 = e.text if (e := soup.find('div', attrs={'class': 'mp-constituency-wrap'})) else None
    print(text2)

Answered By - HedgeHog

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, April 8, 2022

[FIXED] Why do I get output from one but not the other?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels