Issue
i try to collect some data using the following code:
import time
import os
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
if __name__ == '__main__':
print(f"Checking Browser driver...")
os.environ['WDM_LOG'] = '0'
options = Options()
options.add_argument("start-maximized")
options.add_experimental_option("prefs", {"profile.default_content_setting_values.notifications": 1})
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
srv=Service()
driver = webdriver.Chrome (service=srv, options=options)
waitWD = WebDriverWait (driver, 10)
baseLink = "https://residentialprotection.alberta.ca/public-registry/Property"
print(f"Working for {baseLink}")
driver.get (baseLink)
waitWD.until(EC.presence_of_element_located((By.XPATH,'//input[@aria-owns="Municipality_listbox"]'))).send_keys("Calgary")
waitWD.until(EC.presence_of_element_located((By.XPATH, '//button[@id="show-results"]'))).click()
time.sleep(5)
countElems = driver.find_elements(By.XPATH,'//tbody//tr[@role="row"]')
print(len(countElems))
for idx in range(len(countElems)):
time.sleep(3)
elems = driver.find_elements(By.XPATH,'//tbody//tr[@role="row"]')
elems[idx].click()
time.sleep(3)
soup = BeautifulSoup (driver.page_source, 'lxml')
worker = soup.find("label", {"for": "FileNumber"})
wFileNumber = worker.find_next("td").text.strip()
print(f"{idx}: {wFileNumber}")
closeElem = driver.find_elements(By.XPATH,'//a[@aria-label="Close"]')[-1]
closeElem.click()
driver.quit()
When you check the opened chrome-window it is opening all these 10 rows line by line and then parsing using bs4 for the file-number of this file. But in the output i allways get the same file-number 10 times (the file-number from the first row)
$ python temp2.py
Checking Browser driver...
Working for https://residentialprotection.alberta.ca/public-registry/Property
10
0: 21RU3557182
1: 21RU3557182
2: 21RU3557182
3: 21RU3557182
4: 21RU3557182
5: 21RU3557182
6: 21RU3557182
7: 21RU3557182
8: 21RU3557182
9: 21RU3557182
(selenium)
Why do i not get the different file-numbers like i can see it in the opened chrome-browser from selenium?
Solution
Suggestion of @Andrej Kesely is good and quicker than using Selenium.
Anyway, if you want to know what is the issue in your code, on your resource when you are clicking on the item and table is opened, it is left in the DOM.
find
method gets only first element for label FileNumber, so it would be always the label for first opened table.
To fix your code, you need always get last element, using find_all
- it would be always last opened table.
soup = BeautifulSoup(driver.page_source, 'lxml')
worker = soup.find_all("label", {"for": "FileNumber"})[-1]
Answered By - Yaroslavm
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.