Issue
I am using beautiful soup library to extract out data from webpages. Sometimes we have the case where element could not be found in the webpage itself, and if we try to access the sub element than we get error like 'NoneType' object has no attribute 'find'.
Like let say for the below code
res = requests.get(url)
soup = BeautifulSoup(res.text, "html.parser")
primary_name = soup.find('div', {"class": "company-header"}).find('p', {"class": "heading-xlarge"}).text
company_number = soup.find('p', id="company-number").find('strong').text
If I want to handle the error, I have to write something like below.
try:
primary_name = error_handler(soup.find('div', {"class": "company-header"}).find('p', {"class": "heading-xlarge"}).text)
except:
primary_name = None
try:
company_number = soup.find('p', id="company-number").find('strong').text.strip()
except:
company_number = None
And if there are too many elements, then we end up with lots of try
and catch
statements. I actually want to write code in the below manner.
def error_handler(_):
try:
return _
except:
return None
primary_name = error_handler(soup.find('div', {"class": "company-header"}).find('p', {"class": "heading-xlarge"}).text)
# this will still raise the error
I know that above code wouldn't work because it will still try to execute first inner function in error_handler
function, and it would still raise the error.
If you have any idea how to make this code looks cleaner, then please show me.
Solution
I don't know if this is the most efficient way, but you can pass a lambda expression to the error_handler
:
def error_handler(_):
try:
return _()
except:
return None
primary_name = error_handler(lambda: soup.find('div', {"class": "company-header"}).find('p', {"class": "heading-xlarge"}).text)
Answered By - Igor Moraru
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.