Wednesday, November 15, 2023

[FIXED] How to scrape elements with identical tags, classes in Python with BeautifulSoup?

November 15, 2023 beautifulsoup, python, web-scraping No comments

Issue

I'm new to Python and web scraping. I'm trying to grab all the elements in this site's ('div', class_='two_third last') sections. The data I want includes the two use cases (for depression and pain), the two corresponding prices per infusion, the clinic address, the phone number, and the email.

Site HTML snippet

The problem is that the two use cases and prices, and the address and email, all use the same tags and classes. I'm able to extract the data attached to the first instance of the tag/class, but not the second. Please see my code below.

import requests
from bs4 import BeautifulSoup
import pandas as pd

website = 'https://ketamineclinicsdirectory.com/'
result = requests.get(website)
content = result.text

soup = BeautifulSoup(content, 'lxml')

clinics = soup.find_all('div', class_='two_third last')

all_data = []

for item in clinics:
    use_case = item.find('span', class_='declaration')
    price_per_infusion = item.find('span', class_='price')
    address = item.find('span', class_='address')
    phone = item.find('span', class_='phone')

    #print(use_case, price_per_infusion, address, phone)

    all_data.append(
    {
        'use_case': use_case,
        'price_per_infusion': price_per_infusion,
        'address': address,
        'phone': phone
    })

df = pd.DataFrame(all_data)
df

I tried creating a second use_case and then indexing after 'declaration,' but the dataframe just creates a column with 'None' values.

I'm grateful for any help you can provide. Thank you!

Solution

Here is one way of getting that information for each clinic (I didn't include the bits you already know how to get):

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
}
url = 'https://ketamineclinicsdirectory.com/'

r = requests.get(url, headers=headers)

soup = bs(r.text, 'html.parser')
for clinic in soup.select('div[class="two_third last"]'):
    usecase_one = clinic.select('span[class="declaration"]')[0].text.strip() if clinic.select('span[class="declaration"]') else None
    usecase_two = clinic.select('span[class="declaration"]')[1].text.strip() if len(clinic.select('span[class="declaration"]'))>1 else None
    price_usecase_one = clinic.select('span[class="price"]')[0].text.strip() if clinic.select('span[class="price"]') else None
    price_usecase_two = clinic.select('span[class="price"]')[1].text.strip() if len(clinic.select('span[class="price"]'))>1 else None
    print(usecase_one, price_usecase_one, '|', usecase_two, price_usecase_two)

Result in terminal:

For Depression: $375 | For Pain: $675
For Depression: $400 | For Pain: $800
For Depression: $395 | For Pain: $725
For Depression: $450 | For Pain: $750
For Depression: varies | None None
For Depression: varies | None None
For Depression: varies | None None
For Depression: varies | None None
For Depression: varies | For Pain: varies
For Depression: varies | None None
[...]

See BeautifulSoup documentation here.

Answered By - Barry the Platipus

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, November 15, 2023

[FIXED] How to scrape elements with identical tags, classes in Python with BeautifulSoup?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels