Issue
python
I am using scrapy to scrape data from a website, where i want to scrape graphic cards title,price and whether they are in stock or not. The problem is my code is looping twice and instead of having 10 products I am getting 20.
import scrapy
class ThespiderSpider(scrapy.Spider):
name = 'Thespider'
start_urls = ['https://www.czone.com.pk/graphic-cards-pakistan-ppt.154.aspx?page=2']
def parse(self, response):
data = {}
cards = response.css('div.row')
for card in cards:
for c in card.css('div.product'):
data['Title'] = c.css('h4 a::text').getall()
data['Price'] = c.css('div.price span::text').getall()
data['Stock'] = c.css('div.product-stock span.product-data::text').getall()
yield data
Solution
You're doing a nested for loop when one isn't necessary.
Each card can be captured by the CSS selector response.css('div.product')
Code Example
def parse(self, response):
data = {}
cards = response.css('div.product')
for card in cards:
data['Title'] = card.css('h4 a::text').getall()
data['Price'] = card.css('div.price span::text').getall()
data['Stock'] = card.css('div.product-stock span.product-data::text').getall()
yield data
Additional Information
- Use
get()
instead ofgetall()
. The output you get is a list, you'll probably want a string which is whatget()
gives you. - If you're thinking about multiple pages, an items dictionary may be better than yielding a dictionary. Invariably there will be the thing you need to alter and an items dictionary gives you more flexibility to do this.
Answered By - AaronS
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.