Friday, November 26, 2021

[FIXED] Skip NOT Available values while scraping JSON response

November 26, 2021 json, python, scrapy, web-scraping No comments

Issue

I have a bunch of URLs that I want to scrape. Some links have values that don't exist in others. I wanted to know how can I avoid getting an error if there is no value in any URL?

I tried the try and except but it didn't work

In this link, you will see 4 values under examInformation. I wrote the expression for 5 values and it gives me an error. I want it to just skip if the value doesn't exist.

Here is my code:

    try:
        location_1 = schools['examInformation'][0]['cbrLocationShortName']
    except:
        pass
    try:
        location_2 = schools['examInformation'][1]['cbrLocationShortName']
    except:
        pass
    try:
        location_3 = schools['examInformation'][2]['cbrLocationShortName']
    except:
        pass
    try:
        location_4 = schools['examInformation'][3]['cbrLocationShortName']
    except:
        pass
    try:
        location_5 = schools['examInformation'][4]['cbrLocationShortName']
    except:
        pass

   yield {
       "Location 1": location_1 if location_1 else "N/A",
       "Location 2": location_2 if location_2 else "N/A",
       "Location 3": location_3 if location_3 else "N/A",
       "Location 4": location_4 if location_4 else "N/A",
       "Location 5": location_5 if location_5 else "N/A",    
   }

I am getting the following error:

UnboundLocalError: local variable 'location_5' referenced before assignment

NOTE: I am using scrapy with JSON library

Solution

The easiest fix would be to assign a None or your final 'N/A' value to each variable in the except block, i.e:

 try:
     location_5 = schools['examInformation'][4]['cbrLocationShortName']
 except:
     location_5 = 'N/A'

    yield {
           "Location 5": location_5,   
    }

If you want to avoid all the code duplication and exception handling in your example, I would pack it into a loop with a safe get method:

locations = {}
exam_information = schools['examInformation']

for i in range(len(exam_information)):
    location_key = f'Location {i + 1}'
    locations[location_key] = exam_information[i].get('cbrLocationShortName', 'N/A')

yield locations

Answered By - Madjazz

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, November 26, 2021

[FIXED] Skip NOT Available values while scraping JSON response

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels