Issue
I am trying to access the next page of quotes.toscrape.com with python BeautifulSoup. Preferably I want to be able to scrape the next page and the next etc.
I have tried asking chatGPT and the replit ai, however I keep getting errors that they and I can't fix. I think it's because the href of the button doesn't have a full URL but I'm not sure.
Solution
It looks like the URL structure for a specific page is simply https://quotes.toscrape.com/page/<PAGENUMBER>/
. So just do something like this:
import requests
import bs4
pagenumber = 1
while True:
print("Getting page ", pagenumber)
url = f'{https://quotes.toscrape.com/page/{pagenumber}/'
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text)
# ...scrape things here...
pagenumber += 1
Or create pagenumber iterator:
import bs4
import requests
from itertools import count
pagenumbers = count(start=1)
for pagenumber in pagenumbers:
print("Getting page ", pagenumber)
url = f'https://quotes.toscrape.com/page/{pagenumber}/'
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text)
# ...scrape things here...
Answered By - larsks
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.