Issue
I am using the code below to get the titles of websites.
from bs4 import BeautifulSoup
import urllib2
line_in_list = ['www.dailynews.lk','www.elpais.com','www.dailynews.co.zw']
for websites in line_in_list:
url = "http://" + websites
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
site_title = soup.find_all("title")
print site_title
If the list of websites contains a ‘bad’ (non-existent) website/webpage, or the website has some kind or error for example "404 page not found" etc., the script will break and stop.
In what way I can have the script to ignore/skip the ‘bad’ (non-existent) and problematic websites/webpages?
Solution
line_in_list = ['www.dailynews.lk','www.elpais.com',"www.no.dede",'www.dailynews.co.zw']
for websites in line_in_list:
url = "http://" + websites
try:
page = urllib2.urlopen(url)
except Exception as e:
print(e)
continue
soup = BeautifulSoup(page.read())
site_title = soup.find_all("title")
print(site_title)
[<title>Popular News Items | Daily News Online : Sri Lanka's National News</title>]
[<title>EL PAÍS: el periódico global</title>]
<urlopen error [Errno -2] Name or service not known>
[<title>
DailyNews - Telling it like it is
</title>]
Answered By - Padraic Cunningham
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.