Issue
I'm trying to scrape data from specific website but unfortunately failed. The reason is that data is wrapped in complex HTML structure.
Here is my Code:
import bs4
import requests
myUrl = "https://www.nbpharmacists.ca/site/findpharmacy"
data=requests.get(myUrl)
soup=bs4.BeautifulSoup(data.text,'html.parser')
records = soup.find('div', class_="col-sm-12")
for dvs in records:
divs = dvs.find('div')
print(divs)
Expected Result:
Pharmacy Name: Albert County Pharmacy
Pharmacy Manager: Chelsea Steeves
Certificate of Operation Number: P107
Address: 5883 King Street Riverside-Albert NB E4H 4B5
Phone: (506) 882-2226
Fax: (506) 882-2101
Website: albertcountypharmacy.ca
Conclusion
My code is not giving me correct result that i want. Please suggest me best possible solution.
Solution
If you just explore the hierarchy you should be able to find your answer, specifically on ids
, divs
and tables
. See below one option.
myUrl = "https://www.nbpharmacists.ca/site/findpharmacy"
data=requests.get(myUrl)
soup=bs4.BeautifulSoup(data.text,'html.parser')
roster = soup.find('div', attrs={'id': 'rosterRecords'})
tables = roster.findAll('table')
result = [] #initialize a list for all results
for table in tables:
info = table.find('td').find('p').text.strip()
certificate = info.split('Certificate of Operation Number:')[-1].strip()
manager = info.split('Pharmacy Manager:')[1]\
.split('Certificate of Operation Number:')[0].strip()
addr = table.findAll('td')[-1].text.strip()
phone = addr.split('Phone:')[-1].split('Fax:')[0].strip()
fax = addr.split('Fax:')[1].strip().split('\n')[0].strip()
address = addr.split('Phone:')[0].strip()
res = {
'Pharmacy Name': table.find('h2').find('span').text.strip(),
'Certificate of Operation Number': certificate,
'Pharmacy Manager': manager,
'Phone Number': phone,
'Fax Number': fax,
'Address': address,
}
try:
res['website'] = table.findAll('td')[-1].find('a').get('href')
except AttributeError:
res['website'] = None
result.append(res) #append pharmacy info
print (result[0])
Out[25]:
{'Pharmacy Name': 'Albert County Pharmacy',
'Certificate of Operation Number': 'P107',
'Pharmacy Manager': 'Chelsea Steeves',
'Phone Number': '(506) 882-2226',
'Fax Number': '(506) 882-2101',
'Address': '5883 King Street \nRiverside-Albert NB E4H 4B5',
'website': 'http://albertcountypharmacy.ca'}
Answered By - calestini
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.