Issue
I am trying to webscrape a podcast site, I received the first 10 items, but there are no pages in the bottom, thus I cannot loop pages, the scroll down button add new items, what should be the approach to get all the 400 hundreds items, thank you in advance
here is my code :
import time
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
import time
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from selenium.webdriver.chrome.options import Options
options = Options() # Initialize an instance of the Options class
options.headless = True # True -> Headless mode activated
options.add_argument('window-size=1920x1080')
website="https://www.osimhistoria.com/osimhistoria"
driver = webdriver.Chrome(options=options)
#driver=webdriver.Chrome(path) dont need it
driver.get(website)
#driver.maximize_window()
#path='/backup_zachi/Python/webscrape/chromedriver'
driver = webdriver.Chrome()
#driver=webdriver.Chrome(path) dont need it
driver.get(website)
driver.maximize_window()
container = driver.find_element(By.CLASS_NAME,'VM7gjN')
products = container.find_elements(By.XPATH,'//*[@id="comp-kgup2wfz"]')
for product in products:
print(product.text)
Solution
u should scroll down until there are no more items to display. Here is a function u can use to scroll down. Then return to the top of the page and then scrape the data
from selenium import webdriver
import time
def scroll_to_bottom(driver, wait=5):
"""
Scrolls to the bottom of the webpage.
Parameters:
driver (webdriver): The Selenium WebDriver instance.
wait (int): The number of seconds to wait after reaching the bottom.
"""
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to the bottom.
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load the page.
time.sleep(wait)
# Calculate new scroll height and compare with last scroll height.
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
Answered By - Théo Pantecouteau
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.