Issue
I am trying to extract all the description in the links in the class="publication u-padding-xs-ver js-publication"
of this website: https://www.sciencedirect.com/browse/journals-and-books?accessType=openAccess&accessType=containsOpenAccess
I tried both with BeautifulSoup and Selenium but I can't extract anything. You can see in the image below the result I got result
Here is the code I am using
options = Options()
options.add_argument("headless")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
ul = driver.find_element(By.ID, "publication-list")
print("Links")
allLi = ul.find_elements(By.TAG_NAME, "li")
for li in allLi:
print("Links " + str(count) + " " + li.text)
Solution
You are missing waits.
You have to wait for elements to become visible before accessing them.
The best approach to do that is with use of WebDriverWait
expected_conditions
explicit waits.
The following code works
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(options=options, service=webdriver_service)
wait = WebDriverWait(driver, 20)
url = "https://www.sciencedirect.com/browse/journals-and-books?accessType=openAccess&accessType=containsOpenAccess"
driver.get(url)
ul = wait.until(EC.visibility_of_element_located((By.ID, "publication-list")))
allLi = wait.until(EC.presence_of_all_elements_located((By.TAG_NAME, "li")))
print(len(allLi))
the output is:
167
Answered By - Prophet
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.