Issue
I have a bunch of URLs that I want to scrape. Some links have values that don't exist in others. I wanted to know how can I avoid getting an error if there is no value in any URL?
I tried the try
and except
but it didn't work
In this link, you will see 4 values under examInformation. I wrote the expression for 5 values and it gives me an error. I want it to just skip if the value doesn't exist.
Here is my code:
try:
location_1 = schools['examInformation'][0]['cbrLocationShortName']
except:
pass
try:
location_2 = schools['examInformation'][1]['cbrLocationShortName']
except:
pass
try:
location_3 = schools['examInformation'][2]['cbrLocationShortName']
except:
pass
try:
location_4 = schools['examInformation'][3]['cbrLocationShortName']
except:
pass
try:
location_5 = schools['examInformation'][4]['cbrLocationShortName']
except:
pass
yield {
"Location 1": location_1 if location_1 else "N/A",
"Location 2": location_2 if location_2 else "N/A",
"Location 3": location_3 if location_3 else "N/A",
"Location 4": location_4 if location_4 else "N/A",
"Location 5": location_5 if location_5 else "N/A",
}
I am getting the following error:
UnboundLocalError: local variable 'location_5' referenced before assignment
NOTE: I am using scrapy with JSON library
Solution
The easiest fix would be to assign a None or your final 'N/A' value to each variable in the except block, i.e:
try:
location_5 = schools['examInformation'][4]['cbrLocationShortName']
except:
location_5 = 'N/A'
yield {
"Location 5": location_5,
}
If you want to avoid all the code duplication and exception handling in your example, I would pack it into a loop with a safe get method:
locations = {}
exam_information = schools['examInformation']
for i in range(len(exam_information)):
location_key = f'Location {i + 1}'
locations[location_key] = exam_information[i].get('cbrLocationShortName', 'N/A')
yield locations
Answered By - Madjazz
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.