Issue
I'm doing web scraping but i am not getting the output i expected.
I'm learning web scraping and still a beginner. The problem is that not all the quotes are being scraped.
import scrapy
class QuoteSpider(scrapy.Spider):
name = 'Quotes'
start_urls = [
'http://quotes.toscrape.com/'
]
def parse(self, response):
for quotes in response.selector.xpath("//div[@class='quote']"):
yield{
'text':quotes.xpath("//span[@class='text']/text()").extract_first(),
'author':quotes.xpath("//small[@class='author']/text()").extract_first(),
'tags':quotes.xpath("//div[@class='tags']/child::a/text()").extract(),
}
I am expecting that all the quotes on the first page should be scraped. Instead i get same quote and author again and again but it is extracting all the tags everytime. I am still a beginner. I'll appreciate the help.
Solution
this is a common mistake when using xpath on nested selectors.
When you use xpath on a selector that you already extracted, if you want to use what you already extracted as the root for the new xpath selector, you need to start the xpath with .
. If you don't do that, it will just use all the DOM as it normally does.
So just change the final lines to:
{
'text':quotes.xpath(".//span[@class='text']/text()").extract_first(),
'author':quotes.xpath(".//small[@class='author']/text()").extract_first(),
'tags':quotes.xpath(".//div[@class='tags']/child::a/text()").extract(),
}
Answered By - eLRuLL
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.