Issue
Noob web scraper here. I built a spider using Scrapy and Playwright to scrape auto ads for the results of a parameterized search URL on autotrader.com and it's working great to grab data from the first page. I'm now trying to augment it to handle scraping the rest of the pages. I've identified the HTML element for the pagination at the bottom of the first page and have validated that I have the correct xpath to select this via DevTools, yet when I run my spider, response.text
doesn't contain that HTML element or any of its child elements. It contains all other HTML elements, just not those...
Since I'm using Playwright, any concerns about dynamic insertion via Javascript should be minimal. I also added in a "wait_for_selector" method on the pagination element in question with a 60 second timeout and my script just ends up timing out. I'm also using "wait_until" with "networkidle" to ensure the full page has loaded before scraping.
Kinda puzzled what is going on here. The start_url I am using is: here . I would appreciate any feedback y'all might have.
Solution
this is the xpath that you must use to move from page to page, you must reference it to the href and that's it, I hope it works for you.
//*[@aria-label="Next Page"]
Answered By - RicardinhoL
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.