Issue
Title
Code:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import re
from random import randint
theurl = "http://ufcstats.com/event-details/7abe471b61725980"
r=requests.get(theurl)
soup=BeautifulSoup(r.text,'html.parser')
Name=soup.find(class_='b-fight-details__table-body')
Name=Name.text.strip()
links=soup.find_all('a')
# print(links)
Fighter = []
for link in links:
href=link['href']
if href:
print(href)
if r'fighter-details' in href:
Fighter.append(href)
print(Fighter)
Works perfectly for old events:
http://ufcstats.com/event-details/6f812143641ceff8
But not a new event?
http://ufcstats.com/event-details/7abe471b61725980
I get the following error:
return self.attrs[key]
~~~~~~~~~~^^^^^
KeyError: 'href'
But there the same webpage? Why does [href] give me an error, its clearly there in the 'a' tag, I tried to strip out the text from the a tag, but doesn't seem to want to work either.
Solution
In the table there are links without the href=
attribute so your script fails. One way to fix it is to use dict.get()
with default value:
import requests
from bs4 import BeautifulSoup
theurl = "http://ufcstats.com/event-details/7abe471b61725980"
soup=BeautifulSoup(requests.get(theurl).text,'html.parser')
Name=soup.find(class_='b-fight-details__table-body')
links=Name.find_all('a')
Fighter = []
for link in links:
href=link.get('href', '') # <-- get href= attribute or empty string if the attribute doesn't exist
if href:
if 'fighter-details' in href:
Fighter.append(href)
print(*Fighter, sep='\n')
Prints:
http://ufcstats.com/fighter-details/853eb0dd5c0e2149
http://ufcstats.com/fighter-details/6d35bf94f7d30241
http://ufcstats.com/fighter-details/7aa3d6964eff4877
http://ufcstats.com/fighter-details/361d49960a196976
http://ufcstats.com/fighter-details/d1941565abf50b16
http://ufcstats.com/fighter-details/7026eca45f65377b
...and so on.
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.