Issue
Can you please tell what might be the error in it? I am trying to scrape items from flipkart
import scrapy
class flipkart_scrapy(scrapy.Spider):
name = 'flipkart'
urls = ['https://www.flipkart.com/televisions/pr?sid=ckf%2Cczl&p%5B%5D=facets.brand%255B%255D%3DMi&otracker=categorytree&p%5B%5D=facets.serviceability%5B%5D%3Dtrue&p%5B%5D=facets.availability%255B%255D%3DExclude%2BOut%2Bof%2BStock&otracker=nmenu_sub_TVs%20%26%20Appliances_0_Mi']
base_url = urls[0]
page_no = 2
next_page = base_url + '&page=' + str(page_no)
def parse(self, response):
for product in response.css("div._2kHMtA"):
yield {
'name': product.css("div._4rR01T::text").get(),
'price': product.css('div._30jeq3._1_WHN1::text').get(),
'rating': product.css("div._3LWZlK::text").get(),
}
if self.next_page is not None:
yield response.follow(self.next_page, callback=self.parse)
self.page_no += 1
self.next_page = self.base_url + '&page=' + str(self.page_no)
That is the code I'm trying run:
scrapy crawl flipkart
Can you please tell what might be the error in it? I am trying to scrape items from flipkart
it is not scraping anything
Solution
Your spider doesn't do anything because you don't have start_requests
or start_urls
defined.
From the scrapy API documentation for scrapy.Spider
:
This is the simplest spider, and the one from which every other spider must inherit (including spiders that come bundled with Scrapy, as well as spiders that you write yourself). It doesn’t provide any special functionality. It just provides a default
start_requests()
implementation which sends requests from thestart_urls
spider attribute and calls the spider’s method parse for each of the resulting responses.
All you need to do to fix this would be to change your spiders urls
attribute to be called start_urls
. Or override the start_requests
method.
For example:
class flipkart_scrapy(scrapy.Spider):
name = 'flipkart'
start_urls = [...] # <---- This changes to start_urls
Answered By - Alexander
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.