Issue
Set-up
I export my data to a .csv file by the standard command in Terminal (Mac OS), e.g.
scrapy crawl spider -o spider_ouput.csv
Problem
When exporting a new spider_output.csv
Scrapy appends it to the existing spider_output.csv
.
I can think of two solutions,
- Command Scrapy to overwrite instead of append
- Command Terminal to remove the existing
spider_output.csv
prior to crawling
I've read that (to my surprise) Scrapy currently isn't able to do 1. Some people have proposed workarounds, but I can't seem to get it to work.
I've found an answer to solution 2, but can't get it to work either.
Can somebody help me? Perhaps there is a third solution I haven't thought of?
Solution
There is an open issue with scrapy for this feature: https://github.com/scrapy/scrapy/issues/547
There are some solutions proposed in the issue thread:
scrapy runspider spider.py -t json --nolog -o - > out.json
Or just delete output before running scrapy spider:
rm data.jl; scrapy crawl myspider -o data.jl
Answered By - Granitosaurus
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.