Monday, December 11, 2023

[FIXED] Webscraping code failing on similar pages

December 11, 2023 beautifulsoup, python, web-scraping No comments

Issue

Title

Code:

from bs4 import BeautifulSoup
import requests
import pandas as pd
import re
from random import randint


theurl = "http://ufcstats.com/event-details/7abe471b61725980"

r=requests.get(theurl)

soup=BeautifulSoup(r.text,'html.parser')
Name=soup.find(class_='b-fight-details__table-body')
Name=Name.text.strip()
links=soup.find_all('a')

# print(links)
Fighter = []
for link in links:
    href=link['href']
    if href:
        print(href)
        if r'fighter-details' in href:
            Fighter.append(href)
            print(Fighter)

Works perfectly for old events:

http://ufcstats.com/event-details/6f812143641ceff8

But not a new event?

http://ufcstats.com/event-details/7abe471b61725980

I get the following error:

    return self.attrs[key]
           ~~~~~~~~~~^^^^^
KeyError: 'href'

But there the same webpage? Why does [href] give me an error, its clearly there in the 'a' tag, I tried to strip out the text from the a tag, but doesn't seem to want to work either.

Solution

In the table there are links without the href= attribute so your script fails. One way to fix it is to use dict.get() with default value:

import requests
from bs4 import BeautifulSoup

theurl = "http://ufcstats.com/event-details/7abe471b61725980"
soup=BeautifulSoup(requests.get(theurl).text,'html.parser')

Name=soup.find(class_='b-fight-details__table-body')
links=Name.find_all('a')

Fighter = []
for link in links:
    href=link.get('href', '')  # <-- get href= attribute or empty string if the attribute doesn't exist
    if href:
        if 'fighter-details' in href:
            Fighter.append(href)

print(*Fighter, sep='\n')

Prints:

http://ufcstats.com/fighter-details/853eb0dd5c0e2149
http://ufcstats.com/fighter-details/6d35bf94f7d30241
http://ufcstats.com/fighter-details/7aa3d6964eff4877
http://ufcstats.com/fighter-details/361d49960a196976
http://ufcstats.com/fighter-details/d1941565abf50b16
http://ufcstats.com/fighter-details/7026eca45f65377b

...and so on.

Answered By - Andrej Kesely

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, December 11, 2023

[FIXED] Webscraping code failing on similar pages

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels