Issue
I want to scrape this site page link https://kw.com/agent/search/IL/Chicago but this page inspection doesn't have any div class or a href
link. I don't understand which function needs to call to scrape these 652 agent links.
My code:
import requests
from bs4 import BeautifulSoup
url = 'https://kw.com/agent/search/IL/Chicago'
reqs = requests.get(url)
soup = BeautifulSoup(reqs.text, 'html.parser')
urls = []
for link in soup.find_all('a'):
print(link.get('href'))
This code is working for other pages but this site looks complicated to me. How can I collect these site links?
Solution
Actually , all required data is generating from API. Each agent link/url contains a unique Id and this id value with the domain name is the agent link/details page link.
Example:
import requests
api_url = "https://api-endpoint.cons-prod-us-central1.kw.com/graphql"
data={"operationName":"searchAgentsQuery","variables":{"searchCriteria":{"searchTerms":{"param1":"IL","param2":"Chicago"}},"first":50,"after":"99","queryId":"0.8691595723322416"},"query":"query searchAgentsQuery($searchCriteria: AgentSearchCriteriaInput, $first: Int, $after: String) {\n SearchAgentQuery(searchCriteria: $searchCriteria) {\n result {\n agents(first: $first, after: $after) {\n edges {\n node {\n ...AgentProfileFragment\n __typename\n }\n __typename\n }\n pageInfo {\n ...PageInfoFragment\n __typename\n }\n totalCount\n __typename\n }\n __typename\n }\n __typename\n }\n}\n\nfragment PageInfoFragment on PageInfo {\n endCursor\n hasNextPage\n __typename\n}\n\nfragment AgentProfileFragment on AgentProfileType {\n id\n name {\n full\n given\n initials\n __typename\n }\n image\n location {\n address {\n state\n city\n __typename\n }\n __typename\n }\n realEstateEntity {\n name\n __typename\n }\n specialties\n languages\n isAgentLuxuryEnabled\n phone {\n entries {\n ... on ContactSetEntryMobile {\n number\n __typename\n }\n ... on ContactSetEntryEmail {\n email\n __typename\n }\n __typename\n }\n __typename\n }\n agentLicenses {\n licenseNumber\n state\n __typename\n }\n marketCenter {\n market_center_name\n market_center_address1\n market_center_address2\n __typename\n }\n __typename\n}\n"}
headers={
'content-type': 'application/json',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
'x-datadog-origin': 'rum',
'x-datadog-parent-id': '5420198475190660541',
'x-datadog-sampled': '1',
'x-datadog-sampling-priority': '1',
'x-datadog-trace-id': '1837163169752685118',
'x-shared-secret': 'MjFydHQ0dndjM3ZAI0ZHQCQkI0BHIyM='
}
res = requests.post(api_url,headers=headers,json=data)
data = res.json()['data']['SearchAgentQuery']['result']['agents']['edges']
for item in data:
link='https://kw.com/agent/' + item['node']['id']
print(link)
Output:
https://kw.com/agent/UPA-6587385404419399681-8
https://kw.com/agent/UPA-6587385313789222917-3
https://kw.com/agent/UPA-6704789234247561216-6
https://kw.com/agent/UPA-6587385427490459656-4
https://kw.com/agent/UPA-6587385454284918792-0
https://kw.com/agent/UPA-6882009464351350784-8
https://kw.com/agent/UPA-6937439716674322432-5
https://kw.com/agent/UPA-6587385379476373510-1
https://kw.com/agent/UPA-6853411032351416320-2
https://kw.com/agent/UPA-6587385065789456390-4
https://kw.com/agent/UPA-6587385175436890114-3
https://kw.com/agent/UPA-6942951019140222976-1
https://kw.com/agent/UPA-6808491123018551296-7
https://kw.com/agent/UPA-6587385273946116100-8
https://kw.com/agent/UPA-6587385281007677447-9
https://kw.com/agent/UPA-6592268954554945544-5
https://kw.com/agent/UPA-6587385270364864517-7
https://kw.com/agent/UPA-6856325267405185024-3
https://kw.com/agent/UPA-6804158392167718912-3
https://kw.com/agent/UPA-6638843865929490435-1
https://kw.com/agent/UPA-6587384999272361984-6
https://kw.com/agent/UPA-6592267095708119045-4
https://kw.com/agent/UPA-6587385271389274119-4
https://kw.com/agent/UPA-6587385271385079815-8
https://kw.com/agent/UPA-6587385288161681409-1
https://kw.com/agent/UPA-6587385375965011973-7
https://kw.com/agent/UPA-6587385274994008066-1
https://kw.com/agent/UPA-6913250263682408448-6
https://kw.com/agent/UPA-6587385272597565443-9
https://kw.com/agent/UPA-6859526404702093312-9
https://kw.com/agent/UPA-6587385390518407175-2
https://kw.com/agent/UPA-6587385436077776899-8
https://kw.com/agent/UPA-6587384956740640770-9
https://kw.com/agent/UPA-6587385297339674632-1
https://kw.com/agent/UPA-6587385390593904641-1
https://kw.com/agent/UPA-6811013526642786304-3
https://kw.com/agent/UPA-6932834317516042240-9
https://kw.com/agent/UPA-6587385437068947458-5
https://kw.com/agent/UPA-6587385380989808647-6
https://kw.com/agent/UPA-6892926376478015488-5
https://kw.com/agent/UPA-6905262704995926016-2
https://kw.com/agent/UPA-6592947303925784578-6
https://kw.com/agent/UPA-6587385393920495624-5
https://kw.com/agent/UPA-6783788552269369344-7
https://kw.com/agent/UPA-6710285049427382272-8
https://kw.com/agent/UPA-6844700377378430976-0
https://kw.com/agent/UPA-6934540598372548608-6
https://kw.com/agent/UPA-6711387287014834176-1
https://kw.com/agent/UPA-6587385367301132290-0
https://kw.com/agent/UPA-6714648183099023360-3
Answered By - Md. Fazlul Hoque
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.