Issue
I can't find any solutions for the problem I'm having.
I want to scrape the JSON file from https://www.armadarealestate.com/Inventory.aspx
When I go to the network and select the url where the JSON is being loaded from I just get sent to another HTML page, but the response section says that it contains the information about the properties which is what I need.
So how can I pull the JSON file from the website?
import json
import requests
resp = requests.get(url='https://buildout.com/plugins/3e0f3893dc334368bb1ee6274ad5fd7b546414e9/inventory?utf8=%E2%9C%93&page=-3&brandingId=&searchText=&q%5Bsale_or_lease_eq%5D=&q%5Bs%5D%5B%5D=&viewType=list&q%5Btype_eq_any%5D%5B%5D=2&q%5Btype_eq_any%5D%5B%5D=5&q%5Btype_eq_any%5D%5B%5D=1&q%5Bcity_eq%5D=')
print(json.loads(resp.text))
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
In fact when I pull the request that belongs to the JSON file I instead get the response from scraping the url at 'https://buildout.com/plugins/3e0f3893dc334368bb1ee6274ad5fd7b546414e9/inventory?utf8=%E2%9C%93&page=0&brandingId=&searchText=&q%5Bsale_or_lease_eq%5D=&q%5Bs%5D%5B%5D=&viewType=list&q%5Btype_eq_any%5D%5B%5D=2&q%5Btype_eq_any%5D%5B%5D=5&q%5Btype_eq_any%5D%5B%5D=1&q%5Bcity_eq%5D=' which is a html file.
How can I fix this?
Solution
Your response object "resp" is not a valid JSON format. It is just a html content. You can use beautifulsoup to scrape the content from the html.
The reason you are not getting JSON object is due to the Javascript in the html. Python requests only download html document alone, if you want to render the Javascript use libs like selenium.
else, find the URL which loads the JSON via ajax and use requests to get JSON.
In your case, the tested code to scrape JSON:
import requests
url = "https://buildout.com/plugins/3e0f3893dc334368bb1ee6274ad5fd7b546414e9/inventory?utf8=%E2%9C%93&page=0&brandingId=&searchText=&q%5Bsale_or_lease_eq%5D=&q%5Bs%5D%5B%5D=&viewType=list&q%5Btype_eq_any%5D%5B%5D=2&q%5Btype_eq_any%5D%5B%5D=5&q%5Btype_eq_any%5D%5B%5D=1&q%5Bcity_eq%5D="
h = {'accept': 'application/json', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36'}
r = requests.get(url, headers=h)
print(r.json())
#prints the JSON data
Answered By - RG_RG
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.