Saturday, January 1, 2022

[FIXED] Loop through links and scrape data from resulting pages using Selenium + Python

January 01, 2022 python, selenium No comments

Issue

I am new to Selenium and need to scrape a website that contains a list of links structured exactly like:

<a class="unique" href="...">
    <i class="something"></i>
    "Text - "
    <span class="something">Text</span>
</a>
<a class="unique" href="...">
    <i class="something"></i>
    "Text - "
    <span class="something">Text</span>
</a>
...
...

I need to click on this list of links inside a loop and scrape data from result pages. What I have done up till now is:

lists = browser.find_elements_by_xpath("//a[@class='unique']")
for lis in lists:
    print(lis.text)
    lis.click()
    time.sleep(4)
    # Scrape data from this page (works fine).
    browser.back()
    time.sleep(4)

It works fine for the first loop but when the second loop reaches

print(lis.text)

It throws an error saying:

StaleElementReferenceException: Message: stale element reference: element is not attached to the page document

I have tried print (lists) and it gives the list of all the link elements so works fine. The problem occurs when the browser comes back to the previous page. I have tried extending time and using browser.get(...) instead of browser.back() but the error still remains. I don't get why it will not print lis.text because lists still contain a list of all the elements. Any help would be greatly appreciated.

Solution

You are trying to click on the text rather than launching the link.

And clicking on the each link, scraping the data and navigating back also not seems effective instead you can store all the links in some list then you can navigate to each link using the driver.get('some link') method and you can scrape the data. So that you can avoid some exceptions, try the below modified code :

# Locate the anchor nodes first and load all the elements into some list
lists = browser.find_elements_by_xpath("//a[@class='unique']")
# Empty list for storing links
links = []
for lis in lists:
    print(lis.get_attribute('href'))
    # Fetch and store the links
    links.append(lis.get_attribute('href'))

# Loop through all the links and launch one by one
for link in links:
    browser.get(link)
    # Scrape here
    sleep(3)

Or if you want to use your same logic then you can use the Fluent Wait to avoid some exceptions such as StaleElementReferenceException like below :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import *

wait = WebDriverWait(browser, 10, poll_frequency=1, ignored_exceptions=[StaleElementReferenceException])
element = wait.until(EC.element_to_be_clickable((By.XPATH, "xPath that you want to click")))

I hope it helps...

Answered By - Ali

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, January 1, 2022

[FIXED] Loop through links and scrape data from resulting pages using Selenium + Python

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels