Issue
I have tried many different ways I found online on how to locate this button, but after every try, the function gives me an empty list
.
I need to locate the button and click it, to scrape different pages. The whole page is dynamicaly loaded and the contents of the second page aren't loaded until you open it, meaning they are not in a DOM until you move to a seccond page. The pages are dynamic as well, meaning there is no change to the url, if you click on different page.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup
import time
from selenium.webdriver.common.by import By
# Create a new instance of the Chrome driver
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
# Go to the webpage
driver.get('https://is.muni.cz/predmet/?volby=obory:4382@fakulta:1433@obdobi:podzim%202023,jaro%202024@jazyky:eng')
links = []
driver.implicitly_wait(15)
for i in range(1):
website = driver.page_source
soup = BeautifulSoup(website, 'html.parser')
links += ['https://is.muni.cz' + link['href'] for link in soup.find_all('a', class_='course_link')]
button = driver.find_elements(By.XPATH, '//a[@class="isi-zobacek-vpravo isi-inline"]')
button.click()
time.sleep(5)
i += 1
print(links)
driver.quit()
This code just returns an error, because the click function doesn't work, because the button has no content. It is an empty list
.
Solution
First issue - you try to find multiple elements, that will create a list
, but these do not have a methode to click.
Second issue - your selection, there is no a
with such a class
you try to find.
So change your selection to:
driver.find_element(By.XPATH, '//li[@class=" pagination-next"]/a')
Just in case check out expliced waits
and also concepts of try / except
and may use a while loop
to iterate the pages:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from bs4 import BeautifulSoup
import time
from selenium.webdriver.common.by import By
# Create a new instance of the Chrome driver
driver = webdriver.Chrome()
# Go to the webpage
driver.get('https://is.muni.cz/predmet/?volby=obory:4382@fakulta:1433@obdobi:podzim%202023,jaro%202024@jazyky:eng')
links = []
while True:
website = driver.page_source
soup = BeautifulSoup(website, 'html.parser')
links.extend(['https://is.muni.cz' + link['href'] for link in soup.find_all('a', class_='course_link')])
try:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//li[@class=" pagination-next"]/a'))).click()
except TimeoutException:
break
print(links)
driver.quit()
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.