Issue
I'm trying to learn the basics of Scrapy. I've written the below spider to scrape one of the practice websites, books.toscrape.com. The spider scrapes the site and when I just tell it to print
title and price it returns them for every book on the site but when I use yield
, as below it only returns the information for the last book listed on the site.
I've no doubt my mistake's really simple but I can't work out what it is.
Can anyone tell me why this only scrapes the final title and price listing on the site?
Thanks!
import scrapy
class FirstSpider(scrapy.Spider):
name="CW"
start_urls = ['http://books.toscrape.com/']
def parse(self,response):
books = response.xpath('//article[@class="product_pod"]')
for item in books:
title = item.xpath('.//h3/a/@title').getall()
price = item.xpath('.//div/p[@class="price_color"]').getall()
yield {
'title': title,
'price': price,
}
Solution
You misindented the yield: Fixed:
import scrapy
class FirstSpider(scrapy.Spider):
name="CW"
start_urls = ['http://books.toscrape.com/']
def parse(self,response):
books = response.xpath('//article[@class="product_pod"]')
for item in books:
title = item.xpath('.//h3/a/@title').getall()
price = item.xpath('.//div/p[@class="price_color"]').getall()
yield {
'title': title,
'price': price,
}
Answered By - DownloadPizza
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.