Issue
I'm new to Python and web scraping. I'm trying to grab all the elements in this site's ('div', class_='two_third last') sections. The data I want includes the two use cases (for depression and pain), the two corresponding prices per infusion, the clinic address, the phone number, and the email.
The problem is that the two use cases and prices, and the address and email, all use the same tags and classes. I'm able to extract the data attached to the first instance of the tag/class, but not the second. Please see my code below.
import requests
from bs4 import BeautifulSoup
import pandas as pd
website = 'https://ketamineclinicsdirectory.com/'
result = requests.get(website)
content = result.text
soup = BeautifulSoup(content, 'lxml')
clinics = soup.find_all('div', class_='two_third last')
all_data = []
for item in clinics:
use_case = item.find('span', class_='declaration')
price_per_infusion = item.find('span', class_='price')
address = item.find('span', class_='address')
phone = item.find('span', class_='phone')
#print(use_case, price_per_infusion, address, phone)
all_data.append(
{
'use_case': use_case,
'price_per_infusion': price_per_infusion,
'address': address,
'phone': phone
})
df = pd.DataFrame(all_data)
df
I tried creating a second use_case and then indexing after 'declaration,' but the dataframe just creates a column with 'None' values.
I'm grateful for any help you can provide. Thank you!
Solution
Here is one way of getting that information for each clinic (I didn't include the bits you already know how to get):
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
}
url = 'https://ketamineclinicsdirectory.com/'
r = requests.get(url, headers=headers)
soup = bs(r.text, 'html.parser')
for clinic in soup.select('div[class="two_third last"]'):
usecase_one = clinic.select('span[class="declaration"]')[0].text.strip() if clinic.select('span[class="declaration"]') else None
usecase_two = clinic.select('span[class="declaration"]')[1].text.strip() if len(clinic.select('span[class="declaration"]'))>1 else None
price_usecase_one = clinic.select('span[class="price"]')[0].text.strip() if clinic.select('span[class="price"]') else None
price_usecase_two = clinic.select('span[class="price"]')[1].text.strip() if len(clinic.select('span[class="price"]'))>1 else None
print(usecase_one, price_usecase_one, '|', usecase_two, price_usecase_two)
Result in terminal:
For Depression: $375 | For Pain: $675
For Depression: $400 | For Pain: $800
For Depression: $395 | For Pain: $725
For Depression: $450 | For Pain: $750
For Depression: varies | None None
For Depression: varies | None None
For Depression: varies | None None
For Depression: varies | None None
For Depression: varies | For Pain: varies
For Depression: varies | None None
[...]
See BeautifulSoup documentation here.
Answered By - Barry the Platipus
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.