Saturday, August 27, 2022

[FIXED] scrapy export to JSON using FEED_URI parameter in custom_settings

August 27, 2022 scrapy No comments

Issue

I'm trying to export scraped items to a json file. Here is the beginning of the code:

class GetId(scrapy.Spider):
    
    name = "get_id"
    
    path = expanduser("~").replace('\\', '/') + '/dox/Getaround/'

    days = [0, 1, 3, 7, 14, 21, 26, 31]

    dates = []
    
    previous_cars_list = []
    
    previous_cars_id_list = []
    
    crawled_date = datetime.today().date()
    
    for day in days:
        market_date = crawled_date + timedelta(days=day)
        dates.append(market_date)
    
    # Settings
    custom_settings = {
        'ROBOTSTXT_OBEY' : False,
        'DOWNLOAD_DELAY' : 5,
        'CONCURRENT_REQUESTS' : 1,
        'CONCURRENT_REQUESTS_PER_DOMAIN': 1,
        'AUTOTHROTTLE_ENABLED' : True,
        'AUTOTHROTTLE_START_DELAY' : 5,
        'LOG_STDOUT' : True,
        'LOG_FILE' : path + 'log_get_id.txt',
        'FEED_FORMAT': 'json',
        'FEED_URI': path + 'cars_id.json',
    }

I've done this 2 years ago without any issues. Now once I input "scrapy crawl get_id" in the Anaconda console, only the log file is exported and not the json with the data. In the log file the following error arise:

2022-08-25 15:14:48 [scrapy.extensions.feedexport] ERROR: Unknown feed storage scheme: c

Any clue how to deal with this ? Thanks

Solution

I'm not sure what version it was introduced but I always use the FEEDS setting. Either in the settings.py file or by using the custom settings class attribute like you use in your example.

For example:

# Settings
 custom_settings = {
    'ROBOTSTXT_OBEY' : False,
    'DOWNLOAD_DELAY' : 5,
    'CONCURRENT_REQUESTS' : 1,
    'CONCURRENT_REQUESTS_PER_DOMAIN': 1,
    'AUTOTHROTTLE_ENABLED' : True,
    'AUTOTHROTTLE_START_DELAY' : 5,
    'LOG_STDOUT' : True,
    'LOG_FILE' : path + 'log_get_id.txt',
    'FEEDS': {                            # <-- added this
        path + 'cars_id.json':{
             'format': 'json',
             'encoding': 'utf-8'
         }
     }
}

You can find out all of the possible fields that can be set at https://docs.scrapy.org/en/latest/topics/feed-exports.html#feeds

Answered By - Alexander

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, August 27, 2022

[FIXED] scrapy export to JSON using FEED_URI parameter in custom_settings

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels