Issue
Am trying to select a "next" navigation link and cannot seem to find the right combination selector in scrapy.
This is the web url: search page on boat listing site
the link I'm trying to select is this tag:
<a rel="nofollow" class="icon-chevron-right " href="/boats-for-sale/condition-used/type-power/class-power-sport-fishing/?year=2006-2014&length=40-65&page=2"><span class="aria-fixes">2</span></a>
I've tried many combinations of response.xpath and response.css selectors but can't seem to find the right combination.
Using google chrome inspector, I get this xpath: //*[@id="root"]/div[2]/div[2]/div[2]/div/div[3]/a[9]
Ultimately, I'm trying to get the href attribute of the tag which contains the URL I want to follow.
Am I running into problems with the rel='nofollow' attribute and a scrapy setting?
EDIT - this code used to work but now get an error on the css selector:
def parse(self, response):
listing_objs = response.xpath("//div[@class = 'listings-container']/a")
for listing in listing_objs:
yield response.follow(listing.attrib['href'], callback= self.parse_detail)
next_page = response.css("a.icon-chevron-right").attrib['href']
if next_page is not None:
yield response.follow(next_page, callback = self.parse)
Solution
In this case you can access any page of the website bye adding &page=#
at the end of URL, this approach will satisfy accessing next page content after current page have been crawled.
For instance you can do something like this:
def start_request(self):
main_url = "https://www.yachtworld.com/boats-for-sale/condition-used/type-power" \
"/class-power-sport-fishing/?year=2006-2014&length=40-65&page=%(page)s"
for i in range(pages):
yield scrapy.Request(main_url % {'page': i}, callback=self.parse)
Answered By - Moein Kameli
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.