Issue
I have this simply code:
import scrapy
import re
import json
# from scrapy.http import FormRequest
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
class SpiderRecipe(CrawlSpider):
name = "recipe"
start_urls = [
# 'https://www.giallozafferano.it/',
'https://ricetta.it/dolci?page=1',
# 'https://www.buonissimo.it/',
# 'https://migusto.migros.ch/it.html'
]
def parse(self,response):
URL = response.request.url()
if URL.split('/')[2] == "www.ricetta.it":
recipes = response.xpath('//div[contains(@class,"row")]/div[contains(@class,"post-img-left")]').extract()
# iterate through each recipe in a page
for x in recipes.extract():
title = response.xpath(recipes + '/a[contains(@class, "post-title")]/text()').extract()[x]
image = response.xpath(recipes + '/div[contains(@class,"videoContainer")]/img/@src').extract()[x]
description = response.xpath(recipes + '/p[contains(@class,"post-excerpt")]/text()').extract()[x]
yield {
'Title': title,
'Image': image,
'Description': description,
}
page = int(URL.split('=')[1]) + 1;
if (page <= 148):
# iterate through each page of recipes
yield scrapy.Request(URL.split('=')[0] + str(page), callback=self.parse, dont_filter=True)
It is called by the terminal using scrapy runspider recipe.py -o output.json.
The first part of the codw works, because it can take the starting URL, but I don't understand why the parse function is not called, also if the code isn't correct I tried to print at the beginning of the function a string but it didn't work. I tried to check for solutions, but my function is inside the class and I have correctly inserted the url from where we have to start (the link is correct). Maybe it is something very easy but I cannot find it. I also read that the function must be called but in the examples no one does it, and in addition I continuously call it at the end of the code.
Solution
I solved the problem. I have in addition an environment for python in another folder, then I have to activate first the environment, and then I can start scrapy from the terminal where is my spider. The class doesn't have to be instantiate and the methods don't have to be called manually because Scrapy does it automatically.
Answered By - Ele975
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.