Issue
I'm trying to export scraped items to a json file. Here is the beginning of the code:
class GetId(scrapy.Spider):
name = "get_id"
path = expanduser("~").replace('\\', '/') + '/dox/Getaround/'
days = [0, 1, 3, 7, 14, 21, 26, 31]
dates = []
previous_cars_list = []
previous_cars_id_list = []
crawled_date = datetime.today().date()
for day in days:
market_date = crawled_date + timedelta(days=day)
dates.append(market_date)
# Settings
custom_settings = {
'ROBOTSTXT_OBEY' : False,
'DOWNLOAD_DELAY' : 5,
'CONCURRENT_REQUESTS' : 1,
'CONCURRENT_REQUESTS_PER_DOMAIN': 1,
'AUTOTHROTTLE_ENABLED' : True,
'AUTOTHROTTLE_START_DELAY' : 5,
'LOG_STDOUT' : True,
'LOG_FILE' : path + 'log_get_id.txt',
'FEED_FORMAT': 'json',
'FEED_URI': path + 'cars_id.json',
}
I've done this 2 years ago without any issues. Now once I input "scrapy crawl get_id" in the Anaconda console, only the log file is exported and not the json with the data. In the log file the following error arise:
2022-08-25 15:14:48 [scrapy.extensions.feedexport] ERROR: Unknown feed storage scheme: c
Any clue how to deal with this ? Thanks
Solution
I'm not sure what version it was introduced but I always use the FEEDS
setting. Either in the settings.py file or by using the custom settings class attribute like you use in your example.
For example:
# Settings
custom_settings = {
'ROBOTSTXT_OBEY' : False,
'DOWNLOAD_DELAY' : 5,
'CONCURRENT_REQUESTS' : 1,
'CONCURRENT_REQUESTS_PER_DOMAIN': 1,
'AUTOTHROTTLE_ENABLED' : True,
'AUTOTHROTTLE_START_DELAY' : 5,
'LOG_STDOUT' : True,
'LOG_FILE' : path + 'log_get_id.txt',
'FEEDS': { # <-- added this
path + 'cars_id.json':{
'format': 'json',
'encoding': 'utf-8'
}
}
}
You can find out all of the possible fields that can be set at https://docs.scrapy.org/en/latest/topics/feed-exports.html#feeds
Answered By - Alexander
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.