Issue
I want to save a data into a csv file via:
>scrapy crawl spider_name -O ../output/file_name.csv
in another folder than the default one when just using:
>scrapy crawl spider_name -O file_name.csv
however I also want to save the data without any headers. I found that you could include:
FEEDS = {
'file_name.csv': {
'format': 'csv',
'item_export_kwargs': {
'include_headers_line': False,
},
}
}
into settings.py. That now raises the problem that '-O' doesn't replace the file anymore but instead appends when using the command multiple time and more importantly it doesn't work if I use:
FEEDS = {
r'file:///D:\xyz\output\file_name.csv': {
'format': 'csv',
'item_export_kwargs': {
'include_headers_line': False,
},
}
}
for my wanted target folder in Windows.
How do I ensure that the file gets replaced by a new one and that my data gets inserted right away without the header?
Solution
For your first question you simply need to add overwrite: True
to your FEEDS
dictionary:
FEEDS = {
'file_name.csv': {
'format': 'csv',
'item_export_kwargs': {
'include_headers_line': False,
},
'overwrite': True,
}
}
However I don't understand your problem with the command line args. Specifying the storage backend in your settings eliminates the need to provide the -O
command line arguments.
Your URI should work fine if you use forwardslashes instead of backslashes.
file:///D:/xyz/output/file_name.csv
Making these changes should force scrapy to overwrite the csv file for each execution and ignore adding headers to the file.
Answered By - Alexander
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.