Issue
I want to read data from a web page with Python, but I get the following in the console:
None
Process finished with exit code 0
Actually, the HTML code of this class should be displayed.
from bs4 import BeautifulSoup
import requests
html_text = requests.get('https://www.immobilienscout24.de/Suche/de/nordrhein-westfalen/koeln/wohnung-kaufen?enteredFrom=one_step_search').text
soup = BeautifulSoup(html_text, 'lxml')
object = soup.find('li', class_ ='result-list__listing result-list__listing--xl-new')
print(object)
Solution
This website uses the captcha mechanism to check whether requests are coming from a legit browser or not and filters them accordingly. You can try using Selenium to solve the issue.
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
# Create an instance of the WebDriver using the configured browser options
driver = webdriver.Chrome(service= Service(ChromeDriverManager().install()))
# Open the website URL in the browser
url = 'https://www.immobilienscout24.de/Suche/de/nordrhein-westfalen/koeln/wohnung-kaufen?enteredFrom=one_step_search'
driver.get(url)
# Interact with the elements on the page to extract the desired data
# Find the table element
list_element = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'li[class="result-list__listing result-list__listing--xl-new"]')))
# Iterate through the table rows, excluding the first header row
rows = bs(list_element.get_attribute('innerHTML'), 'lxml')
print(rows)
Here's the selenium counterpart of your code.
Answered By - Zero
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.