Issue
I'm trying to scrape bits of information with Beautiful Soup, I put a try
... except
in the for loop but it doesn't seem to be efficient. I must do something wrong but I dont know where.
This gets html from a list of URL's called occupations_list
. Example of a URL: https://candidat.pole-emploi.fr/offres/emploi/horticulteur/s1m1
for occupation in occupations_list:
offers_page = requests.get(occupation)
offers_soup = BeautifulSoup(offers_page.content, 'lxml')
offers = offers_soup.find('ul', class_='result-list list-unstyled')
This get a headline in the html I got above
for job in offers:
try:
headline = job.find('h2', class_='t4 media-heading').text
except Exception as e:
pass
print(headline)
The problem is that I got the following error message after a few headlines have already been scraped:
TypeError Traceback (most recent call last)
<ipython-input-77-cbf6b87ac0f9> in <module>()
3 offres_soup = BeautifulSoup(offres_page.content, 'lxml')
4 offres = offres_soup.find('ul', class_='result-list list-unstyled')
----> 5 for job in offres:
6 try:
7 headline = job.find('h2', class_='t4 media-heading').text
TypeError: 'NoneType' object is not iterable
Solution
None
denotes that nothing was found, you might use if
... is None
check rather than try-except to skip if nothing was found as follows
for occupation in occupations_list:
offers_page = requests.get(occupation)
offers_soup = BeautifulSoup(offers_page.content, 'lxml')
offers = offers_soup.find('ul', class_='result-list list-unstyled')
if offers is None:
continue
print("Processing offers")
replace print("Processing offers")
with your actual processing
Answered By - Daweo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.