Issue
my script is not returning email field that I want to scrape from a website . Any workaround??
from bs4 import BeautifulSoup
import requests
url = 'https://www.kw.com/agent/UPA-6587385179144187908-1'
res = requests.get(url)
soup = BeautifulSoup(res.content,'html.parser')
name = soup.find('div',class_='AgentContent__name').text.strip()
location = soup.find('div',class_='AgentContent__location').text.strip()
website = soup.find('a',class_='AgentInformation__factBody').attrs['href']
print(website)
print(name)
print(location)
This is what I get
/cdn-cgi/l/email-protection#f18394909d94828590859482b199949895989093949d94df929e9c
Heidi Abele
Campbell, CA
Solution
All information is in the script block, u can get everything u want. For example:
import requests
from bs4 import BeautifulSoup
import json
response = requests.get('https://www.kw.com/agent/UPA-6587385179144187908-1')
soup = BeautifulSoup(response.text, 'lxml')
json_data = json.loads(soup.find('script', {'id': '__NEXT_DATA__'}).get_text())
name = json_data['props']['pageProps']['agentData']['name']['full']
city = json_data['props']['pageProps']['agentData']['location']['city']
state = json_data['props']['pageProps']['agentData']['location']['state']
email = json_data['props']['pageProps']['agentData']['email']
website = json_data['props']['pageProps']['agentData']['website']
print(f"{name}, {city}, {state}, {email}, {website}")
OUTPUT:
Heidi Abele, Campbell, CA, [email protected], https://heidiabelerealtor.com/
Answered By - Sergey K
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.