Issue
I'm trying to scrape a specific web page and although on the console I get all results, on the outputted csv I don't. In this case, I want both title and author of a specific search, but I only get the title. If I reverse the order of the two I get author, so it only takes the first one. Why?
import scrapy
QUERY = "q=brilliant+friend&qt=results_page#x0%253Abook-%2C%2528x0%253Abook%2Bx4%253Aprintbook%2529%2C%2528x0%253Abook%2Bx4%253Adigital%2529%2C%2528x0%253Abook%2Bx4%253Alargeprint%2529%2C%2528x0%253Abook%2Bx4%253Amss%2529%2C%2528x0%253Abook%2Bx4%253Athsis%2529%2C%2528x0%253Abook%2Bx4%253Abraille%2529%2C%2528x0%253Abook%2Bx4%253Amic%2529%2Cx0%253Aartchap-%2C%2528x0%253Aartchap%2Bx4%253Achptr%2529%2C%2528x0%253Aartchap%2Bx4%253Adigital%2529format"
class Spider(scrapy.Spider):
name = 'worldcatspider'
start_urls = ['https://www.worldcat.org/search?start=%s&%s' % (number, QUERY) for number in range(0, 4400, 10)]
def parse(self, response):
for title in response.css('.name a > strong ::text').extract():
yield {"title:": title}
for author in response.css('.author ::text').extract():
yield {"author:": author}
Solution
My suggestion will be put for statement their head class or div.
I haven't checked but this should work:
def parse(self, response):
for page in response.css('.menuElem'):
title = page.css('.name a > strong ::text').extract()
author = page.css('.author ::text').extract()
yield {"title": title,
"author:": author}
Answered By - Furkan Ozalp
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.