Issue
I will preface this by saying I have very little experience with coding in general. I'm currently using Scrapy with Python 3.5 to pull SKU# and Pricing values for Home Depot's website. Using the scrappy tutorial documentation I managed to put together something that will pull the correct data and move on to the next page.
The problem is that after Scrapy yields the second page of items it goes back to the first again and then repeats before closing. Essentially it's just going Url 1-> Url 2 -> Url 1 -> Url -> 2 and then finishing without error. The code I'm using is below:
# -*- coding: utf-8 -*-
import scrapy
class ScraperSpider(scrapy.Spider):
name = "scraper"
allowed_domains = ["www.homedepot.com"]
start_urls = ["http://www.homedepot.com/s/whirlpool?NCNI-5"]
def parse(self, response):
for sku in response.css('div.plp-pod'):
yield {
'model' : sku.css('div.pod-plp__model::text').extract_first(),
'price' : sku.css('div.price__wrapper > div:nth-child(1) > span::text').extract_first()
}
next = response.css('li.hd-pagination__item.hd-pagination__button > a::attr(href)').extract_first()
print(next)
if next is not None:
next = response.urljoin(next)
yield scrapy.Request(next, callback=self.parse)
As far as I can tell by inspecting the webpages, the second URL shares the same CSS as the first and should request a subsequent link. Any help would be appreciated!
Solution
The reason this is happening is because the previous button looks like this:
<li class="hd-pagination__item hd-pagination__button">
<a class="hd-pagination__link" title="Previous" href="/b/N-5yc1v/Ntk-BrandSearch/Ntt-whirlpool?NCNI-5" data-pagenumber="1">
</a>
</li>
So once you get to the second page you're selecting the previous button. The next button, on the other hand looks like this:
<li class="hd-pagination__item hd-pagination__button">
<a class="hd-pagination__link" title="Next" href="/b/N-5yc1v/Ntk-BrandSearch/Ntt-whirlpool?NCNI-5&Nao=48&Ns=None" data-pagenumber="3">
</a>
</li>
So you need to select it not only based on the class but also on the title
.
Answered By - Forest Kunecke
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.