Issue
I am doing a web scraping project for this site. https://yellowpages.com.eg/en/search/fast-food I managed to scrape the data but I am struggling with the pagination As I want to make a loop that scrapes the next page button and then uses the scraped URL from the next button to do the same process.
url = 'https://yellowpages.com.eg/en/search/fast-food'
while True:
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
pages = soup.find_all('ul', class_='pagination center-pagination')
for page in pages:
nextpage = page.find('li', class_='waves-effect').find('a', {'aria-label': 'Next'})
if nextpage:
uu = nextpage.get('href')
url = 'http://www.yellowpages.com.eg' + str(uu)
print(url)
else:
break
This code returns the next URL in the pagination order and then breaks out of loop.
Solution
The problem is that
nextpage =page.find('li', class_='waves-effect').find('a', {'aria-label' : 'Next'})
does return the Next button, but only as long as the Previous button is not there, meaning that it breaks as soon as you leave the first page (it returns None).
Instead, page.find_all('li', class_='waves-effect')
returns the Next and the Previous button.
To (maybe) robustly get the Next button, change your line to
nextpage =page.find_all('li', class_='waves-effect')[-1].find('a', {'aria-label' : 'Next'})
Answered By - mcsoini
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.