Issue
Need some help here. My code is working when I am crawling one page via (scrapy.Spider). However once I switch to (CrawlSpider) to scrape the entire website, it does not seems to work at all.
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
class QuotesSpider(CrawlSpider):
name = "quotes"
allowed_domains = ['reifen.check24.de']
start_urls = [
'https://reifen.check24.de/pkw-sommerreifen/toyo-proxes-cf2-205-55r16-91h-2276003?label=ppc',
'https://reifen.check24.de/pkw-sommerreifen/michelin-pilot-sport-4-205-55zr16-91w-213777?label=pc'
]
rules = (
Rule(LinkExtractor(deny= ('cart')), callback='parse_item', follow=True),
)
def parse(self, response):
for quote in response.xpath('/html/body/div[2]/div/section/div/div/div[1]'):
yield {
'brand': quote.xpath('//tbody//tr[1]//td[2]//text()').get(),
'pattern': quote.xpath('//tbody//tr[3]//td[2]//text()').get(),
'size': quote.xpath('//tbody//tr[6]//td[2]//text()').get(),
'RR': quote.xpath('div[1]/div[1]/div/div[1]/div[2]/span/span/span/div/div/div[1]/span/text()').get(),
'WL': quote.xpath('div[1]/div[1]/div/div[1]/div[2]/span/span/span/div/div/div[2]/span/text()').get(),
'noise': quote.xpath('div[1]/div[1]/div/div[1]/div[2]/span/span/span/div/div/div[3]/span/text()').get(),
}
Am I missing something?
Solution
You have a tiny mistake:
rules = (
Rule(LinkExtractor(deny= ('cart')), callback='parse_item', follow=True),
)
should be:
rules = (
Rule(LinkExtractor(deny= ('cart')), callback='parse', follow=True),
)
Answered By - SuperUser
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.