Issue
I am doing my first steps with Selenium in Python and want to extract a certain value from a webpage. The value i need to find on the webpage is the ID (Melde-ID), which is 355460. In the html i found the 2 lines containing my info:
<h3 _ngcontent-wwf-c32="" class="title"> Melde-ID: 355460 </h3><span _ngcontent-wwf-c32="">
<div _ngcontent-wwf-c27="" class="label"> Melde-ID </div><div _ngcontent-wwf-c27="" class="value">
I have been searching websites for about 2 hours for what command to use but i don't know what to actually search for in the html. The website is a html with .js modules. It works to open the URL over selenium.
(At first i tried using beautifulsoup but was not able to open the page for some restriction. I did verify that the robots.txt does not disallow anything, but the error on beautifulsoup was "Unfortunately, a problem occurred while forwarding your request to the backend server".)
I would be thankful for any advice and hope i did explain my issue. The code i tried to create in Jupyter Notebook with Selenium installed is as follows:
from selenium import webdriver
import codecs
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
url = "https://...."
driver = webdriver.Chrome('./chromedriver')
driver.implicitly_wait(0.5)
#maximize browser
driver.maximize_window()
#launch URL
driver.get(url)
#print(driver.page_source)
#Try 2
#print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[normalize-space()='Melde-ID']")))])
#close browser
driver.quit()
Solution
From the information you shared here we can see that the element containing the desired information doesn't have class name attribute with a value of Melde-ID
.
It has class name with value of title
and contains text Melde-ID
.
Also, you should use webdriver wait expected condition instead of driver.implicitly_wait(0.5)
.
With these changes your code can be something like this:
from selenium import webdriver
import codecs
import os
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
url = "https://...."
driver = webdriver.Chrome('./chromedriver')
wait = WebDriverWait(driver, 20)
#maximize browser
driver.maximize_window()
#launch URL
driver.get(url)
content = wait.until(EC.visibility_of_element_located((By.XPATH, "//*[contains(@class,'title') and contains(.,'Melde-ID:')]"))).text
I added .text
to extract the text from that web element.
Now content
should contain Melde-ID: 355460
value.
Answered By - Prophet
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.