Wednesday, December 6, 2023

[FIXED] Scrapy parse function not called

December 06, 2023 python, scrapy, web-crawler No comments

Issue

I have this simply code:

import scrapy
import re
import json
# from scrapy.http import FormRequest
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor


class SpiderRecipe(CrawlSpider):
    name = "recipe"
    start_urls = [
        # 'https://www.giallozafferano.it/',
        'https://ricetta.it/dolci?page=1',
        # 'https://www.buonissimo.it/',
        # 'https://migusto.migros.ch/it.html'
    ]

    def parse(self,response):
        URL = response.request.url()
        if URL.split('/')[2] == "www.ricetta.it":

        recipes = response.xpath('//div[contains(@class,"row")]/div[contains(@class,"post-img-left")]').extract()
        # iterate through each recipe in a page
        for x in recipes.extract():
            title = response.xpath(recipes + '/a[contains(@class, "post-title")]/text()').extract()[x]
            image = response.xpath(recipes + '/div[contains(@class,"videoContainer")]/img/@src').extract()[x]
            description = response.xpath(recipes + '/p[contains(@class,"post-excerpt")]/text()').extract()[x]
            yield {
                'Title': title,
                'Image': image,
                'Description': description,
            }
            page = int(URL.split('=')[1]) + 1;
            if (page <= 148):
                # iterate through each page of recipes
                yield scrapy.Request(URL.split('=')[0] + str(page), callback=self.parse, dont_filter=True)

It is called by the terminal using scrapy runspider recipe.py -o output.json.

The first part of the codw works, because it can take the starting URL, but I don't understand why the parse function is not called, also if the code isn't correct I tried to print at the beginning of the function a string but it didn't work. I tried to check for solutions, but my function is inside the class and I have correctly inserted the url from where we have to start (the link is correct). Maybe it is something very easy but I cannot find it. I also read that the function must be called but in the examples no one does it, and in addition I continuously call it at the end of the code.

Solution

I solved the problem. I have in addition an environment for python in another folder, then I have to activate first the environment, and then I can start scrapy from the terminal where is my spider. The class doesn't have to be instantiate and the methods don't have to be called manually because Scrapy does it automatically.

Answered By - Ele975

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, December 6, 2023

[FIXED] Scrapy parse function not called

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels