Friday, April 8, 2022

[FIXED] How to go to the next page using BeautifulSoup?

April 08, 2022 beautifulsoup, python, web-scraping No comments

Issue

I am trying to scrape data from all the 37 web pages from this website.

The website I am scrapping doesn't allow going to the next page through the search bar.

This is the HTML written for the next button.

<a href="javascript:void('Next')" class="next">
    <svg viewBox="0 0 36 36" data-use="/cms/svg/site/icon_caret_right.36.svg">
        (path tag and data)
    </svg>
</a>

I know that this can be done with Selenium, but is there any way to do this with BeautifulSoup?

Is there any way to scrape data from the next page?

Solution

So you can go to each page using requests here. It's through a post request, that then uses the query page parameter to get back the data for sequential pages:

import requests
from bs4 import BeautifulSoup
import re

url = 'https://www.stfrancismedicalcenter.com/find-a-provider/'


for page in range(1, 38):
    print(f'\t\tPage: {page}')
    payload = {
    '_m_': 'FindAPhysician',
    'PhysicianSearch$HDR0$PhysicianName': '',
    'PhysicianSearch$HDR0$SpecialtyIDs': '',
    'PhysicianSearch$HDR0$Distance': '5',
    'PhysicianSearch$HDR0$ZipCodeSearch': '',
    'PhysicianSearch$HDR0$Keywords': '',
    'PhysicianSearch$HDR0$LanguageIDs': '',
    'PhysicianSearch$HDR0$Gender': '',
    'PhysicianSearch$HDR0$InsuranceIDs': '',
    'PhysicianSearch$HDR0$AffiliationIDs': '',
    'PhysicianSearch$HDR0$NewPatientsOnly': '',
    'PhysicianSearch$HDR0$InNetwork': '',
    'PhysicianSearch$HDR0$HasPhoto': '',
    'PhysicianSearch$FTR01$PagingID': str(page)}
    
    response = requests.post(url, data=payload)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    items = soup.find_all('li', {'class':re.compile("^half item-")})
    for item in items:
        itemName = item.find('div', {'class':'info'}).find_all('span')[0].text
        itemType = item.find('div', {'class':'info'}).find_all('span')[1].text
        phone = item.find('li', {'class':'inline-svg phone'}).text.strip()
        address = item.find('address').text.strip().replace('\t','')
        
        print(f'\n{itemName}\n{itemType}\n{phone}\n{address}\n')

Answered By - chitown88

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, April 8, 2022

[FIXED] How to go to the next page using BeautifulSoup?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels