Friday, January 26, 2024

[FIXED] Getting error when trying to scrape items from flipkart using python

January 26, 2024 html, python, scrapy, web, web-scraping No comments

Issue

Can you please tell what might be the error in it? I am trying to scrape items from flipkart

import scrapy

class flipkart_scrapy(scrapy.Spider):
    name = 'flipkart'
    urls = ['https://www.flipkart.com/televisions/pr?sid=ckf%2Cczl&p%5B%5D=facets.brand%255B%255D%3DMi&otracker=categorytree&p%5B%5D=facets.serviceability%5B%5D%3Dtrue&p%5B%5D=facets.availability%255B%255D%3DExclude%2BOut%2Bof%2BStock&otracker=nmenu_sub_TVs%20%26%20Appliances_0_Mi']
    base_url = urls[0]
    page_no = 2
    next_page = base_url + '&page=' + str(page_no)

    def parse(self, response):
        for product in response.css("div._2kHMtA"):
            yield {
                'name': product.css("div._4rR01T::text").get(),
                'price': product.css('div._30jeq3._1_WHN1::text').get(),
                'rating': product.css("div._3LWZlK::text").get(),
            }

        if self.next_page is not None:
            yield response.follow(self.next_page, callback=self.parse)
            self.page_no += 1
            self.next_page = self.base_url + '&page=' + str(self.page_no)

That is the code I'm trying run: scrapy crawl flipkart

Can you please tell what might be the error in it? I am trying to scrape items from flipkart it is not scraping anything

Solution

Your spider doesn't do anything because you don't have start_requests or start_urls defined.

From the scrapy API documentation for scrapy.Spider:

This is the simplest spider, and the one from which every other spider must inherit (including spiders that come bundled with Scrapy, as well as spiders that you write yourself). It doesn’t provide any special functionality. It just provides a default start_requests() implementation which sends requests from the start_urls spider attribute and calls the spider’s method parse for each of the resulting responses.

All you need to do to fix this would be to change your spiders urls attribute to be called start_urls. Or override the start_requests method.

For example:

class flipkart_scrapy(scrapy.Spider):
    name = 'flipkart'
    start_urls = [...]  #   <---- This changes to start_urls

Answered By - Alexander

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, January 26, 2024

[FIXED] Getting error when trying to scrape items from flipkart using python

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels