Issue
I have been trying to crawl a series of pages, that are linked from a source page, but failing, below is my spider.py file
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
class DogstatsSpider(CrawlSpider):
name = "dogstats"
allowed_domains = ["www.ukdogracing.net"]
start_urls = ["http://www.ukdogracing.net/todays-runners/"]
rules = (
Rule(LinkExtractor(allow='runners'), callback='parse_item')
)
def parse_item(self, response):
yield {
'dogtime' : "test"
}
When I crawl this I get the following error:
TypeError: 'Rule' object is not iterable
to add more detail, all the href links that I want are specified in css format of
href="/runners/*"
I note that the full link would be
http://www.ukdogracing.net/runners/* as listed in http://www.ukdogracing.net/todays-runners/
where am I going wrong?
Solution
update, I have found the answer, I should have been using [] not () for rules =
strange the tutorial I was following on youtube used () and it worked, but not for my spider, however I changed mine to [] and it is now working.
Answered By - user2115136
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.