Issue
After accessing this website, when I fill in the inputbox (City or zip
) with Miami, FL
and hit the search button, I can see the related results displayed on that site.
I wish to mimic the same using requests module. I tried to follow the steps shown in dev tools but for some reason the script below comes up with this output You are not authorized to access this request
.
I've tried with:
import json
import requests
from pprint import pprint
from bs4 import BeautifulSoup
URL = "https://www.realtor.com/realestateagents/"
link = 'https://www.realtor.com/realestateagents/api/v3/search'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36',
'Accept': 'application/json, text/plain, */*',
'referer': 'https://www.realtor.com/realestateagents/',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,bn;q=0.8',
'X-Requested-With': 'XMLHttpRequest',
'x-newrelic-id': 'VwEPVF5XGwQHXFNTBAcAUQ==',
'authorization': 'Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NjQ1MjU0NDQsInN1YiI6ImZpbmRfYV9yZWFsdG9yIiwiaWF0IjoxNjY0NTI0Nzk2fQ.Q2jryTAD5vgsJ37e1SylBnkaeK7Cln930Q8KL4ANqsM'
}
params = {
'nar_only': '1',
'offset': '',
'limit': '20',
'marketing_area_cities': 'FL_Miami',
'postal_code': '',
'is_postal_search': 'true',
'name': '',
'types': 'agent',
'sort': 'recent_activity_high',
'far_opt_out': 'false',
'client_id': 'FAR2.0',
'recommendations_count_min': '',
'agent_rating_min': '',
'languages': '',
'agent_type': '',
'price_min': '',
'price_max': '',
'designations': '',
'photo': 'true',
'seoUserType': "{'isBot':'false','deviceType':'desktop'}",
'is_county_search': 'false',
'county': ''
}
with requests.Session() as s:
s.headers.update(headers)
res = s.get(link,params=params)
print(res.status_code)
print(res.json())
EDIT:
For those who think using res.json() is pointless, see this image, which was taken straight from the dev tool. If I could set up params and headers correctly while submitting requests, I could utilize res.json() successfully.
Solution
The issue is that the Authorization token is invalid after a few seconds, so you will need to refresh (regenerate) it per request.
First of all, you will need to get the JWT secret used to create the JWT tokens (RegEx to extract it from the HTML source code):
# Which is hardcoded in the HTML
SECRET = findall(r'"JWT_SECRET":"(.*?)"', requests.get('https://www.realtor.com/realestateagents/').text)[0]
Then use the secret to generate a new Authorization token:
# Create JWT
jwt_payload = {
"exp": int(time() + 9999), # expiry date
"sub": "find_a_realtor",
"iat": int(time()) # issued at
}
# Encode it with their secret
jwt = encode(jwt_payload, SECRET, algorithm="HS256")
Add it to your headers, then run the request, like you did before:
# Add the JWT to the headers
headers = {
'authorization': 'Bearer ' + jwt,
}
# Attach headers to the request
response = requests.get(
url='https://www.realtor.com/realestateagents/api/v3/search?nar_only=1&offset=&limit=20&marketing_area_cities=FL_Miami&postal_code=&is_postal_search=true&name=&types=agent&sort=recent_activity_high&far_opt_out=false&client_id=FAR2.0&recommendations_count_min=&agent_rating_min=&languages=&agent_type=&price_min=&price_max=&designations=&photo=true&seoUserType=\\{%22isBot%22:false,%22deviceType%22:%22desktop%22\\}&is_county_search=false&county=',
headers=headers
)
Putting it all together...
import requests
from jwt import encode
from time import time
from re import findall
# First we need to get their JWT Secret... which is securely hardcoded in the HTML
SECRET = findall(r'"JWT_SECRET":"(.*?)"', requests.get('https://www.realtor.com/realestateagents/').text)[0]
# Create JWT
jwt_payload = {
"exp": int(time() + 9999),
"sub": "find_a_realtor",
"iat": int(time())
}
# Encode it with their secret
jwt = encode(jwt_payload, SECRET, algorithm="HS256")
# Add the JWT to the headers
headers = {
'authorization': 'Bearer ' + jwt,
}
# Attach headers to the request
response = requests.get(
url='https://www.realtor.com/realestateagents/api/v3/search?nar_only=1&offset=&limit=20&marketing_area_cities=FL_Miami&postal_code=&is_postal_search=true&name=&types=agent&sort=recent_activity_high&far_opt_out=false&client_id=FAR2.0&recommendations_count_min=&agent_rating_min=&languages=&agent_type=&price_min=&price_max=&designations=&photo=true&seoUserType=\\{%22isBot%22:false,%22deviceType%22:%22desktop%22\\}&is_county_search=false&county=',
headers=headers
)
# Print the JSON output
print(response.json())
Answered By - Xiddoc
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.