Issue
I understand how to export my scraped data in to a csv format via
scrapy crawl <spider_name> -o filename.csv
However I'd like to run my spider from a script and automatically write to csv (so I can use schedule to run the spider at particular times). How could I implement this into my code and where would it go? I.E would it go into pipeline or my actual spider assuming this can be done.
Solution
Scrapy uses pipelines to post process the data you have scraped. You can create
a file called pipelines.py
which contains the following code which exports
your data into a folder exports
. Here's some code that I use in one of my
pip projects
from scrapy import signals
from scrapy.contrib.exporter import CsvItemExporter, JsonItemExporter
class ExportData(object):
def __init__(self):
self.files = {}
self.exporter = None
@classmethod
def from_crawler(cls, crawler):
pipeline = cls()
crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
return pipeline
def spider_opened(self, spider):
raise NotImplementedError
def spider_closed(self, spider):
self.exporter.finish_exporting()
file_to_save = self.files.pop(spider)
file_to_save.close()
def process_item(self, item, spider):
self.exporter.export_item(item)
return item
class ExportJSON(ExportData):
"""
Exporting to export/json/spider-name.json file
"""
def spider_opened(self, spider):
file_to_save = open('exports/%s.json' % spider.name, 'w+b')
self.files[spider] = file_to_save
self.exporter = JsonItemExporter(file_to_save)
self.exporter.start_exporting()
class ExportCSV(ExportData):
"""
Exporting to export/csv/spider-name.csv file
"""
def spider_opened(self, spider):
file_to_save = open('exports/%s.csv' % spider.name, 'w+b')
self.files[spider] = file_to_save
self.exporter = CsvItemExporter(file_to_save)
self.exporter.start_exporting()
You can view the project code on github. You just need to add these class names in your scrapy settings correctly.
Answered By - user378704
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.