Issue
I developed few spiders in scrapy & I want to test those on Heroku cloud. Does anybody have any idea about how to deploy a Scrapy spider on Heroku cloud?
Solution
Yes, it's fairly simple to deploy and run your Scrapy spider on Heroku.
Here are the steps using a real Scrapy project as example:
Clone the project (note that it must have a
requirements.txt
file for Heroku to recognize it as a Python project):git clone https://github.com/scrapinghub/testspiders.git
Add cffi to the requirement.txt file (e.g. cffi==1.1.0).
Create the Heroku application (this will add a new heroku git remote):
heroku create
Deploy the project (this will take a while the first time, when the slug is built):
git push heroku main
Run your spider:
heroku run scrapy crawl followall
Some notes:
- Heroku disk is ephemeral. If you want to store the scraped data in a persistent place, you can use a S3 feed export (by appending
-o s3://mybucket/items.jl
) or use an addon (like MongoHQ or Redis To Go) and write a pipeline to store your items there - It would be cool to run a Scrapyd server on Heroku, but it's not currently possible because the
sqlite3
module (which Scrapyd requires) doesn't work on Heroku - If you want a more sophisticated solution for deploying your Scrapy spiders, consider setting up your own Scrapyd server or using a hosted service like Scrapy Cloud
Answered By - Pablo Hoffman
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.