Issue
I am trying to run my spider from my script. It runs fine from command prompt and it runs fine from the script if I don't use my proxies (except I get 403's because I'm not using proxies).
I have tried changing my filepath, but none worked.
In settings.py I simply use
ROTATING_PROXY_LIST_PATH = 'proxylist'
This is my scapy.cfg, I tried changing 'scraper' to scraper.scraper for the heck of it, but didn't work.
[settings]
default = scraper.settings
[deploy]
#url = http://localhost:6800/
project = scraper
This is my project structure
- rascraper
- scraper
- spiders
- init.py
- Spider.py
- init.py
- items.py
- middewares.py
- pipelines.py
- settings.py
- scraper
- scrapy.cfg
- proxylist
- spiders
- scraper
I don't think including the spider is relevant, but this is how I call it (in the same file)
if __name__ == '__main__':
process = CrawlerProcess(get_project_settings())
process.crawl('Acts', artist="eddiem")
process.start()
Why does scrapy not find my proxyfile when calling the settings via get_project_settings()?
Solution
Your scrapy.cfg
needs to be moved to it's parent directory.
According to the scrapy docs.
Though it can be modified, all Scrapy projects have the same file structure by default, similar to this:
scrapy.cfg myproject/ __init__.py items.py middlewares.py pipelines.py settings.py spiders/ __init__.py spider1.py spider2.py ...
The directory where the scrapy.cfg file resides is known as the project root directory. That file contains the name of the python module that defines the project settings. Here is an example:
[settings] default = myproject.settings
Which means the scrapy.cfg
file should be at least one directory above the the project directory/directory with the settings.py
file.
Answered By - Alexander
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.