Issue
I'm trying to scrape the https://findamortgagebroker.com/ site.
When I use a search url such as "https://findamortgagebroker.com/?search=San%20Diego&page=2", I don't get the tags that I see when I do the inspection using the dev tools.
I want to scrape for 'a' elements having 'class' equals 'clickable-tile-contact'.
def get_soup(url):
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
time.sleep(10)
html_page = urlopen(req).read()
time.sleep(10)
soup = BeautifulSoup(html_page, 'html.parser')
return soup
url="https://findamortgagebroker.com/?search=San%20Diego&page=2"
soup=get_soup(url)
links=soup.find_all('a', attrs={'class':'clickable-tile-contact'})
Solution
Actually, Required data is loaded from external source via API
as AJAX request as plain HTML tree as post method. So to get the right data you have to apply API url instead.
Full working code as an example:
import requests
from bs4 import BeautifulSoup
api_url ='https://findamortgagebroker.com/home/SearchContacts/'
headers= {
"content-type":"application/x-www-form-urlencoded",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"}
body = "searchModel%5BSearchText%5D=San+Diego&searchModel%5BPageNumber%5D=2&searchModel%5BRadius%5D=50&searchModel%5BResultsPerPage%5D=20&searchModel%5BCaptchaToken%5D=03AEkXODDG8q9JqC--gCpxJK_Kevp506iB5o5Z7ilzY3Ge6GbYQaoX9jcOJqEyC6TG159L5KSvPoE43UlBxGMYW2jlNcnc0ING0sFeQO2RZIOui0YnNAaByRIVrjaluwaNi7WCE2FykjJNI0B5FNLB7nJjnr9N7YEeUkY13km0wRN3vfyqPh-bVdpahCir00GzE-pQyXU_o84bY1dCWRNQten7O_cnmdcA0ucEPxFeO3WIbMkUkUqqMC5vpAUiz_VttmYMyRETidTuaI6rHE2_AjGbUr6Z61vXFr-dXAC63alA15gGu8ypGRljtHS2wmfNSSySrtegnFxD3txZZ4d2KDk4ugBXLfh3jNUHM_KcKF6Rkp0WOHx7-D-4CEfMf-mC9zJ6FnVqJx3FTZiOrwcelQ0dW1OxdHuHlCVPPQlzIzcFMfsTJOsCLj3JNZTEgkQ6Eicl6dkVV-F-CRPd4fQZ2D_u3dDmrIaCIQJJ4LlQuSYXhLt-6QMcnFXceygadkKGqeiGQZcdUeagF6c8zz9OUg5g2ppXkCu-WsH08e-ei7sRHspA3Rdwh6sylcr8fqFlxDNmEXTI4CH1nRgLvJMuXr6KdcY3AWNhwA&searchModel%5BIsVendorRequest%5D=false&searchModel%5BVendorIdentifier%5D=0&searchModel%5BCaptchaV2%5D=false"
res = requests.post(api_url,data=body,headers=headers)
#print(res)
soup = BeautifulSoup(res.text,'lxml')
data =[]
for item in soup.select('.clickable-tile-contact'):
data.append({
'href':item.get('href'),
})
print(data)
Output:
[{'href': 'https://findamortgagebroker.com/Profile\\AndresCamacho26826'}, {'href': 'https://findamortgagebroker.com/Profile\\DavidStein65836'}, {'href': 'https://findamortgagebroker.com/Profile\\DanielRamirez28222'}, {'href': 'https://findamortgagebroker.com/Profile\\DavidHolland56665'}, {'href': 'https://findamortgagebroker.com/Profile\\EvbeniiMalenko57387'}, {'href': 'https://findamortgagebroker.com/Profile\\AmirNurani66326'}, {'href': 'https://findamortgagebroker.com/Profile\\MarialuisaSarrizLira37868'}, {'href': 'https://findamortgagebroker.com/Profile\\DejaCorreia53368'}, {'href': 'https://findamortgagebroker.com/Profile\\JulioRugama72662'}, {'href': 'https://findamortgagebroker.com/Profile\\MarthaMunoz26537'}, {'href': 'https://findamortgagebroker.com/Profile\\CarlosMunoz55258'}, {'href': 'https://findamortgagebroker.com/Profile\\AndreaCutuk35775'}, {'href': 'https://findamortgagebroker.com/Profile\\LauraPardo64458'}, {'href': 'https://findamortgagebroker.com/Profile\\KatiePike37454'}, {'href': 'https://findamortgagebroker.com/Profile\\JustinGuthrie27854'}, {'href': 'https://findamortgagebroker.com/Profile\\GinoSalvaggio54863'}, {'href': 'https://findamortgagebroker.com/Profile\\AnnaValencia55287'}, {'href': 'https://findamortgagebroker.com/Profile\\ArtinMousakhan27554'}, {'href': 'https://findamortgagebroker.com/Profile\\GloriaPereira45832'}, {'href': 'https://findamortgagebroker.com/Profile\\NickKinnard38652'}]
Answered By - Fazlul
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.