Issue
New to webscraping. I'm trying to scrape (if possible) this website https://www.memberleap.com/members/directory/search_csam.php for the company titles listed (and maybe the descriptions). With the code I've written below, it returns empty brackets. Using soup.find the responses is 'None'
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.memberleap.com/members/directory/search_csam.php'
requests.get(url)
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
# uncomment to view html info
#print(soup)
name_plate = soup.find_all('div', class_ = 'name-plate')
print(name_plate)
I want to obtain the name of the companies listed beyond the name-plate index. In this cases, it's ADG Creative and Corporate Office Properties Trust. Is it possible to do this, given the href changes between each company name?
Solution
The names you see on the page are loaded from external URL via JavaScript. To simulate this requests you can use next example:
import requests
from bs4 import BeautifulSoup
api_url = 'https://www.memberleap.com/members/directory/search_csam_ajax.php?org_id=CSAM'
data = {
"buttonNum": "B1",
"searchValues[bci]": "0",
"searchValues[order]": "alpha",
"searchValues[keyword]": "",
"searchValues[employer_only]": "",
"searchValues[county]": "",
"searchValues[zip_code]": "",
"searchValues[zip_range]": "5",
"searchValues[providerassociate]": "",
"searchValues[dynamic_field_3786]": "0",
"searchValues[dynamic_field_3787]": "0",
"searchValues[latitude]": "",
"searchValues[longitude]": "",
"two_column": ""
}
headers = {'X-Requested-With': 'XMLHttpRequest'}
for p in range(1, 4): # <-- increase the page numbers here
data['buttonNum'] = f'B{p}'
soup = BeautifulSoup(requests.post(api_url, data=data, headers=headers).content, 'html.parser')
for n in soup.select('.name-plate'):
print(n.text)
Prints:
...
Transwestern
TriBridge Partners, LLC
Univ. of MD Advanced Cybersecurity Experience for Students
University System of Maryland
Venture Potential
Weller Development Company
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.