Issue
I am trying to use Python to web scrape a website that loads it's HTML dynamically by using embedded javascript files that render the data as a Response into the HTML. Therefore, if I use BeautifulSoup alone, I will not be able to retrieve that data that I need as my program will scrape it before the Javascript loads the data. Due to this, I am integrating the selenium library into my code, to make my program wait until a certain element is found before it scrapes the website.
I had originally done this:
element = WebDriverWait(driver,100).until(EC.presence_of_element_located((By.ID, "tabla_evolucion")))
But I want to specify a class instead by doing something like:
element = WebDriverWait(driver,100).until(EC.presence_of_element_located((By.class, "ng-binding ng-scope")))
Here is the rest of my code:
driver_path = 'C:/webDrivers/chromedriver.exe'
driver = webdriver.Chrome(executable_path=driver_path)
driver.header_overrides = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'}
url = "myurlthatIamscraping.com"
response = driver.get(url)
html = driver.page_source
characters = len(html)
element = WebDriverWait(driver,100).until(EC.presence_of_element_located((By.class, "ng-binding ng-scope")))
print(html)
print(characters)
time.sleep(10)
driver.quit()
It is not working for me and I can not find the right syntax anywhere.
Solution
The relevant HTML would have helped us to construct a more canonical answer. However to start with your first line of code:
element = WebDriverWait(driver,100).until(EC.presence_of_element_located(
(By.ID, "tabla_evolucion")))
is pretty much legitimate where as the second line of code:
element = WebDriverWait(driver,100).until(EC.presence_of_element_located(
(By.class, "ng-binding ng-scope")))
Will raise an error as:
Message: invalid selector: Compound class names not permitted
as you can't pass multiple classes through By.class
.
You can find a detailed discussion in Invalid selector: Compound class names not permitted using find_element_by_class_name with Webdriver and Python
Solution
You need to take care of a couple of things as follows:
- Without any visibility to your usecase, functionally inducing WebDriverWait in association with EC as
presence_of_element_located()
merely confirms the presence of the element within the DOM Tree. Presumably moving ahead either you need to get the attributes e.g.value
,innerText
, etc or you would interact with the element. So instead ofpresence_of_element_located()
you need to use eithervisibility_of_element_located()
orelement_to_be_clickable()
You can find a detailed discussion in WebDriverWait not working as expected
For an optimum result you can club up the
ID
andCLASS
attributes and you can use either of the following Locator Strategies:Using
CSS_SELECTOR
:
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located(
(By.CSS_SELECTOR, ".ng-binding.ng-scope#tabla_evolucion")))
- Using
XPATH
:
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located(
(By.XPATH, "//*[@class='ng-binding ng-scope' and @id='tabla_evolucion']")))
Answered By - undetected Selenium
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.