Issue
I am trying to webscrape this site https://www.edgeprop.sg/condo-apartment/aquarius-by-the-park to get the Land Size (sqm) in the overview table. Result should give me 40,608
However, I am unable to get the result I want. Here is my code:
#[Python] test webscrape on edgeprop
import gspread
import json
from oauth2client.service_account import ServiceAccountCredentials
from openpyxl.worksheet import worksheet
from requests.api import request
import requests
import time
from requests.models import Response
import scrapy
from bs4 import BeautifulSoup
from six import add_metaclass, class_types
query_string='https://www.edgeprop.sg/condo-apartment/aquarius-by-the-park'
resp = requests.get(query_string)
soup = BeautifulSoup(resp.content,'html.parser')
print("soup is: ", query_string)
try:
landsize = soup.find_all("h4",class_="detail-title__text")
print("Landsize is: ", landsize)
except IndexError:
pass
Solution
Try this:
import json
import requests
from bs4 import BeautifulSoup
query_string='https://www.edgeprop.sg/condo-apartment/aquarius-by-the-park'
resp = requests.get(query_string)
soup = BeautifulSoup(resp.content,'html.parser')
# get data with all info
data = soup.find("script", id="__NEXT_DATA__").text
# convert string to python dict
json_data = json.loads(data)
# get land_size from dict
print(json_data["props"]["pageProps"]["projectInfo"]["data"]["land_size"])
In html
response you can find json which includes all information.
Answered By - dimay
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.