Issue
This the the element that I am trying to find:
tbody class="searchable ng-scope" ng-repeat="ut in vm.unitList | filter: (leaseLength: (vm.weekOption.value I l
"')}">…</tbody>
**Xpath - //*[@id="no-more-tables"]/tbody**
And this is my code:
driver.get(url)
[enter image description here][1]
property_name = driver.title
print('Property =====',property_name)
rooms = driver.find_elements(By.XPATH, '//*[@id="no-more-tables"]/tbody')
print (len(rooms))
The length of rooms are coming 0 even though I gave the correct xpath. Ideally it should come 5
Solution
Data in that page is being loaded dynamically by Javascript, after loading the html. The following code will (remove the complexities of selenium and) get you the data you're after:
import requests
import pandas as pd
s = requests.Session()
data = {"route":"unitlist","command":"","data":"{\"list\":{\"filter\":{\"propertyNoFilter\":\"PR0170000\",\"dateFilter\":\"03/09/2022\"}}}"}
r = s.get('https://www.hellostudent.co.uk/student-accommodation/stoke/caledonia-mills/')
r = s.post('https://rooms.hellostudent.co.uk/DynamicsNav/Call', data=data)
# print(r.json()['list']['property'])
df = pd.DataFrame(r.json()['list']['property'])
print(df)
Result:
unitType unitSubType unitDescription noOfUnits mainPropertyNo startDate endDate leaseLength pricePerWeek biannualAvailable termPaymentAvailable features
0 SHAPT SHAPT-Q4-B4-ES Silver 4-Bed Apartment Ensuite 0 PR0170000 03/09/22 25/08/23 51 89.00 false true None
1 SHAPT SHAPT-Q4-B3-ES Silver 3-Bed Apartment Ensuite 12 PR0170000 03/09/22 25/08/23 51 93.00 false true None
2 SHAPT SHAPT-Q4-B2-ES Silver 2-Bed Apartment Ensuite 7 PR0170000 03/09/22 25/08/23 51 115.00 false true None
3 STUDIO STUDIO-Q4 Silver Studio 1 PR0170000 03/09/22 25/08/23 51 153.00 false true None
4 STUDIO STUDIO-Q3 Gold Studio 0 PR0170000 03/09/22 25/08/23 51 169.00 false true None
If you want to do it with selenium, bear in mind data is in an iframe:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
import time as t
import pandas as pd
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
url = 'https://www.hellostudent.co.uk/student-accommodation/stoke/caledonia-mills/'
browser.get(url)
iframe = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.XPATH, "//iframe[@class='panel__frame']")))
browser.switch_to.frame(iframe)
t.sleep(5)
rooms_table = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "table[id='no-more-tables']")))
df = pd.read_html(str(rooms_table.get_attribute('outerHTML')))
print(df[0])
browser.quit()
Which would display the dataframe:
ROOM TYPE PRICE PER WEEK/PER PERSON WEEKS START DATE AVAILABILITY
0 Silver 4-Bed Apartment Ensuite NaN NaN NaN Sold Out
1 Silver 3-Bed Apartment Ensuite £93 51.0 03/09/22 Available - Book Now
2 Silver 2-Bed Apartment Ensuite £115 51.0 03/09/22 Available - Book Now
3 Silver Studio £153 51.0 03/09/22 Last few remaining - book now
4 Gold Studio NaN NaN NaN Sold Out
It would make more sense to actually scrape the url that iframe is loading from: https://rooms.hellostudent.co.uk/#/RoomAvailability/caledonia-mills
Answered By - platipus_on_fire
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.