Issue
I want to get some HTML tags' texts from bonbast which some elements are loaded by ajax (for example tag with "ounce_top" id). I have tried selenium and geckodriver but again I can not crawl these tags and also when robotic firefox (geckodriver) opens, these elements are not shown on the web page! I have no idea why it happens. How can I crawl this website?
Code trials:
from selenium import webdriver
from bs4 import BeautifulSoup
url_news = 'https://bonbast.com/'
driver = webdriver.Firefox()
driver.get(url_news)
html = driver.page_source
soup = BeautifulSoup(html)
a = driver.find_element_by_id(id_="ounce_top")
Solution
The desired element is a dynamic element, so ideally to extract the desired text i.e. 1,817.43 you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
driver.get("https://bonbast.com/") WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.btn.btn-primary.btn-sm.acceptcookies"))).click() print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span#ounce_top"))).text)
Using XPATH:
driver.get("https://bonbast.com/") WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.btn.btn-primary.btn-sm.acceptcookies"))).click() print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[@id='ounce_top']"))).text)
Console Output:
1,817.43
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python
Answered By - undetected Selenium
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.