Issue
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36'
}
for x in range(60,61):
url = 'https://example.com/page/'
r = requests.get(url+str(x), headers = headers)
soup = BeautifulSoup(r.content, features='lxml')
articles = soup.find_all('article', class_='blog-view')
for item in articles:
title = item.find('h2', class_="entry-title").text
if title == "Premium" or title == "Deleted" or title == "deleted":
image_url = "None"
else:
try:
image_url = item.find('div', class_='entry-content').p.img['src']
except TypeError:
image_url = item.find('div', class_='wp-caption').img['src']
except AttributeError:
image_url = "None"
print(image_url)
Output
TypeError
Cell In [10], line 30
29 try:
---> 30 image_url = item.find('div', class_='entry-content').p.img['src']
31 except TypeError:
TypeError: 'NoneType' object is not subscriptable
During handling of the above exception, another exception occurred:
AttributeError
Cell In [10], line 32
30 image_url = item.find('div', class_='entry-content').p.img['src']
31 except TypeError:
---> 32 image_url = item.find('div', class_='wp-caption').img['src']
33 except AttributeError:
34 image_url = "None"
AttributeError: 'NoneType' object has no attribute 'img
I am a newbie, I have given 2 exceptions one for TypeError
and another for AttributeError
so at the end I should get "None"
in the output.
But somehow the 2nd exception is not executing. In python, we can give as many exceptions as we can but in this case, 2nd exception is not executing. why? Is this because of for loop or indentation?
Solution
Your second except
should be nested in the first except
- currently it is not, hence you get that error. Try this:
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36'
}
for x in range(60,61):
url = 'https://example.com/page/'
r = requests.get(url+str(x), headers = headers)
soup = BeautifulSoup(r.content, features='lxml')
articles = soup.find_all('article', class_='blog-view')
for item in articles:
title = item.find('h2', class_="entry-title").text
if title == "Premium" or title == "Deleted" or title == "deleted":
image_url = "None"
else:
try:
image_url = item.find('div', class_='entry-content').p.img['src']
except TypeError:
try:
image_url = item.find('div', class_='wp-caption').img['src']
except AttributeError:
image_url = "None"
print(image_url)
If you still have issues, confirm the actual url, and your end goal (what are you after?), and I will update my answer.
Answered By - Barry the Platipus
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.