Sunday, January 30, 2022

[FIXED] Getting AttributeError error 'str' object has no attribute 'get'

January 30, 2022 api, json, scrapy No comments

Issue

I am getting an error while working with JSON response:

Error: AttributeError: 'str' object has no attribute 'get'

What could be the issue?

I am also getting the following errors for the rest of the values:

***TypeError: 'builtin_function_or_method' object is not subscriptable

'Phone': value['_source']['primaryPhone'], KeyError: 'primaryPhone'***

# -*- coding: utf-8 -*-
import scrapy
import json


class MainSpider(scrapy.Spider):
    name = 'main'
    start_urls = ['https://experts.expcloud.com/api4/std?searchterms=AB&size=216&from=0']

def parse(self, response):

    resp = json.loads(response.body)
    values = resp['hits']['hits']

    for value in values:

        yield {
            'Full Name': value['_source']['fullName'],
            'Phone': value['_source']['primaryPhone'],
            "Email": value['_source']['primaryEmail'],
            "City": value.get['_source']['city'],
            "Zip Code": value.get['_source']['zipcode'],
            "Website": value['_source']['websiteURL'],
            "Facebook": value['_source']['facebookURL'],
            "LinkedIn": value['_source']['LinkedIn_URL'],
            "Twitter": value['_source']['Twitter'],
            "BIO": value['_source']['Bio']
        }

Solution

It's nested deeper than what you think it is. That's why you're getting an error.

Code Example

import scrapy
import json


class MainSpider(scrapy.Spider):
    name = 'test'
    start_urls = ['https://experts.expcloud.com/api4/std?searchterms=AB&size=216&from=0']

    def parse(self, response):
        resp = json.loads(response.body)
        values = resp['hits']['hits']

        for value in values:
            yield {
                'Full Name': value['_source']['fullName'],
                'Primary Phone':value['_source']['primaryPhone']
            }

Explanation

The resp variable is creating a python dictionary, but there is no resp['hits']['hits']['fullName'] within this JSON data. The data you're looking for, for fullName is actually resp['hits']['hits'][i]['_source']['fullName']. i being an number because resp['hits']['hits'] is a list.

resp['hits'] is a dictionary and therefore the values variable is fine. But resp['hits']['hits'] is a list, therefore you can't use the get request, and it's only accepts numbers as values within [], not strings. Hence the error.

Tips

Use response.json() instead of json.loads(response.body), since Scrapy v2.2, scrapy now has support for json internally. Behind the scenes it already imports json.
Also check the json data, I used requests for ease and just getting nesting down till I got the data you needed.
Yielding a dictionary is fine for this type of data as it's well structured, but any other data that needs modifying or changing or is wrong in places. Use either Items dictionary or ItemLoader. There's a lot more flexibility in those two ways of yielding an output than yielding a dictionary. I almost never yield a dictionary, the only time is when you have highly structured data.

Updated Code

Looking at the JSON data, there are quite a lot of missing data. This is part of web scraping you will find errors like this. Here we use a try and except block, for when we get a KeyError which means python hasn't been able to recognise the key associated with a value. We have to handle that exception, which we do here by saying to yield a string 'No XXX'

Once you start getting gaps etc it's better to consider an Items dictionary or Itemloaders.

Now it's worth looking at the Scrapy docs about Items. Essentially Scrapy does two things, it extracted data from websites, and it provides a mechanism for storing this data. The way it does this is storing it in a dictionary called Items. The code isn't that much different from yielding a dictionary but Items dictionary allows you to manipulate the extracted data more easily with extra things scrapy can do. You need to edit your items.py first with the fields you want. We create a class called TestItem, we define each field using scrapy.Field(). We then can import this class in our spider script.

items.py

import scrapy


class TestItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    full_name = scrapy.Field()
    Phone = scrapy.Field()
    Email = scrapy.Field()
    City = scrapy.Field()
    Zip_code = scrapy.Field()
    Website = scrapy.Field()
    Facebook = scrapy.Field()
    Linkedin = scrapy.Field()
    Twitter = scrapy.Field()
    Bio = scrapy.Field()

Here we're specifying what we want the fields to be, you can't use a string with spaces unfortunately hence why full name is full_name. The field() creates the field of the item dictionary for us.

We import this item dictionary into our spider script with from ..items import TestItem. The from ..items means we're taking the items.py from the parent folder to the spider script and we're importing the class TestItem. That way our spider can populate the items dictionary with our json data.

Note that just before the for loop we instantiate the class TestItem by item = TestItem(). Instantiate means to call upon the class, in this case it makes a dictionary. This means we are creating the item dictionary and then we populate that dictionary with keys and values. You have to does this before you add your keys and values as you can see from within the for loop.

Spider script

import scrapy
import json
from ..items import TestItem

class MainSpider(scrapy.Spider):
   name = 'test'
   start_urls = ['https://experts.expcloud.com/api4/std?searchterms=AB&size=216&from=0']

   def parse(self, response):
       resp = json.loads(response.body)
       values = response.json()['hits']['hits']
       item = TestItem()
       for value in values:
        try:
            item['full_name'] = value['_source']['fullName']
        except KeyError:
            item['full_name'] = 'No Name'
        try:
            item['Phone'] = value['_source']['primaryPhone']
        except KeyError:
            item['Phone'] = 'No Phone number'
        try:
            item["Email"] =  value['_source']['primaryEmail']
        except KeyError:
            item['Email'] = 'No Email'
        try:
            item["City"] = value['_source']['activeLocations'][0]['city']
        except KeyError:
            item['City'] = 'No City'
        try:
             item["Zip_code"] = value['_source']['activeLocations'][0]['zipcode']
        except KeyError:
            item['Zip_code'] = 'No Zip code'
                
        try:
            item["Website"] = value['AgentMarketingCenter'][0]['Website']
        except KeyError:
            item['Website'] = 'No Website'
               
        try:
            item["Facebook"] = value['_source']['AgentMarketingCenter'][0]['Facebook_URL']
        except KeyError:
            item['Facebook'] = 'No Facebook'
                
        try:
            item["Linkedin"] = value['_source']['AgentMarketingCenter'][0]['LinkedIn_URL']
        except KeyError:
            item['Linkedin'] = 'No Linkedin'    
        try:
            item["Twitter"] = value['_source']['AgentMarketingCenter'][0]['Twitter']
        except KeyError:
            item['Twitter'] = 'No Twitter'
        
        try:
             item["Bio"]: value['_source']['AgentMarketingCenter'][0]['Bio']
        except KeyError:
            item['Bio'] = 'No Bio'
               
        yield item

Answered By - AaronS

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0