Issue
I wrote a crawler to fetch information out of an Q&A website. Since not all the fields are presented in a page all the time, I used multiple try-excepts to handle the situation.
def answerContentExtractor( loginSession, questionLinkQueue , answerContentList) :
while True:
URL = questionLinkQueue.get()
try:
response = loginSession.get(URL,timeout = MAX_WAIT_TIME)
raw_data = response.text
#These fields must exist, or something went wrong...
questionId = re.findall(REGEX,raw_data)[0]
answerId = re.findall(REGEX,raw_data)[0]
title = re.findall(REGEX,raw_data)[0]
except requests.exceptions.Timeout ,IndexError:
print >> sys.stderr, URL + " extraction error..."
questionLinkQueue.task_done()
continue
try:
questionInfo = re.findall(REGEX,raw_data)[0]
except IndexError:
questionInfo = ""
try:
answerContent = re.findall(REGEX,raw_data)[0]
except IndexError:
answerContent = ""
result = {
'questionId' : questionId,
'answerId' : answerId,
'title' : title,
'questionInfo' : questionInfo,
'answerContent': answerContent
}
answerContentList.append(result)
questionLinkQueue.task_done()
And this code, sometimes, may or may not, gives the following exception during runtime:
UnboundLocalError: local variable 'IndexError' referenced before assignment
The line number indicates the error occurs at the second except IndexError:
Thanks everyone for your suggestions, Would love to give the marks that you deserve, too bad I can only mark one as the correct answer...
Solution
In Python 2.x, the line
except requests.exceptions.Timeout, IndexError:
except requests.exceptions.Timeout as IndexError:
Thus, the exception caught by requests.exceptions.Timeout
is assigned to IndexError
. A simpler example:
try:
true
except NameError, IndexError:
print IndexError
#name 'true' is not defined
To catch multiple exceptions, put the names in parentheses:
except (requests.exceptions.Timeout, IndexError):
Later, an UnboundLocalError
can occur because the assignment to IndexError
makes it a local variable (shadowing the builtin name):
>>> 'IndexError' in answerContentExtractor.func_code.co_varnames
True
So, if requests.exceptions.Timeout
was not raised, IndexError
will not have been (incorrectly) defined when the code attempts except IndexError:
.
Again, a simpler example:
def func():
try:
func # defined, so the except block doesn't run,
except NameError, IndexError: # so the local `IndexError` isn't assigned
pass
try:
[][1]
except IndexError:
pass
func()
#UnboundLocalError: local variable 'IndexError' referenced before assignment
In 3.x, the problem will occur (after fixing the except
syntax, which makes the error more obvious) even if the first exception is caught. This is because the local name IndexError
will then be explicitly del
d after the first try
/except
block.
Answered By - Ashwini Chaudhary
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.