Saturday, August 20, 2022

[FIXED] Get value from a website using selenium in python

August 20, 2022 python, selenium, selenium-webdriver, webdriverwait, xpath No comments

Issue

I am doing my first steps with Selenium in Python and want to extract a certain value from a webpage. The value i need to find on the webpage is the ID (Melde-ID), which is 355460. In the html i found the 2 lines containing my info:

<h3 _ngcontent-wwf-c32="" class="title"> Melde-ID: 355460 </h3><span _ngcontent-wwf-c32="">
<div _ngcontent-wwf-c27="" class="label"> Melde-ID </div><div _ngcontent-wwf-c27="" class="value">

I have been searching websites for about 2 hours for what command to use but i don't know what to actually search for in the html. The website is a html with .js modules. It works to open the URL over selenium.

(At first i tried using beautifulsoup but was not able to open the page for some restriction. I did verify that the robots.txt does not disallow anything, but the error on beautifulsoup was "Unfortunately, a problem occurred while forwarding your request to the backend server".)

I would be thankful for any advice and hope i did explain my issue. The code i tried to create in Jupyter Notebook with Selenium installed is as follows:

from selenium import webdriver
import codecs
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options

url = "https://...."
driver = webdriver.Chrome('./chromedriver')
driver.implicitly_wait(0.5)
#maximize browser
driver.maximize_window()
#launch URL
driver.get(url)
#print(driver.page_source)
#Try 2
#print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[normalize-space()='Melde-ID']")))])
#close browser
driver.quit()

Solution

From the information you shared here we can see that the element containing the desired information doesn't have class name attribute with a value of Melde-ID.
It has class name with value of title and contains text Melde-ID.
Also, you should use webdriver wait expected condition instead of driver.implicitly_wait(0.5).
With these changes your code can be something like this:

from selenium import webdriver
import codecs
import os
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options

url = "https://...."
driver = webdriver.Chrome('./chromedriver')

wait = WebDriverWait(driver, 20)

#maximize browser
driver.maximize_window()
#launch URL
driver.get(url)

content = wait.until(EC.visibility_of_element_located((By.XPATH, "//*[contains(@class,'title') and contains(.,'Melde-ID:')]"))).text

I added .text to extract the text from that web element.
Now content should contain Melde-ID: 355460 value.

Answered By - Prophet

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, August 20, 2022

[FIXED] Get value from a website using selenium in python

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels