Issue
I'm using Scrapyd to use scrapy as webservice.
I would like to use the curl command with parameters like this :
curl http://myip:6800/schedule.json -d project=default -d spider=myspider -d domain=www.google.fr
But I don't know how to get the parameter domain in the Crawler.
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
class MyItem(Item):
url = Field()
class HttpbinSpider(CrawlSpider):
name = "expired"
start_urls = [domain]
I need to pass sometimes one domain or multiples in arguments.
Thank's !
Solution
It's not possible due to a missing feature in scrapy.
Users typically workaround this by serializing the arguments to curl and then unserializing in the spider's init().
curl http://myip:6800/schedule.json -d project=default -d spider=myspider -d domains='["www1.example.com", "www2.example.com"]'
Code :
class MySpider(Spider):
def __init__(self, domains=None):
domains = json.loads(domains)
# do something with domains...
Answered By - Pixel
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.