Sunday, November 13, 2022

[FIXED] python selenium how to handle webpage randomly generates popups

November 13, 2022 python, selenium No comments

Issue

I am learning how to scrap webpages and I got this issue with this website: https://www.centris.ca/en/properties~for-sale~brossard?view=Thumbnail

During the script execution, it would randomly give me a popup to subscribe: https://imgur.com/a/tzCVvg4

I already got the code to handle it, but it pops up at completely random intervals.

Like my current selection criteria would mean i have to scrap 41 pages, sometimes it is showing up at page 2, right before I click next page, sometimes it is showing up at 39, right as I am grabbing the price of a particular listing.

I can't just let the page sit there and wait, because I tried that and sometimes it doesn't show up for a solid 10 minutes and sometimes it shows at the 5min mark or 2min mark (since start of script).

If i visit the page manually, I get this issue way less often. I could click through all the listings and not get the pop up even once.

I am at a loss as to how to handle this.

import numpy as np
from Tools.scripts.dutree import display
from selenium import webdriver

from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import time 

url = 'https://www.centris.ca/en/properties~for-sale~brossard?view=Thumbnail'


def scrap_pages(driver):
    listings = driver.find_elements(By.CLASS_NAME, 'description')

    if listings[-1].text.split('/n')[0] == '': del listings[-1]

    for listing in listings:
        price=listing.find_element(By.XPATH, ".//div[@class='price']/meta[@itemprop='price']").text
        mls = listing.find_element(By.XPATH, ".//div[@id='MlsNumberNoStealth']/p").text
        prop_type = listing.find_element(By.XPATH, ".//div[@class='location-container']/span[@itemprop='category']").text
        addr = listing.find_element(By.XPATH, ".//div[@class='location-container']/span[@class='address']").text
        city = addr.split('\n')[1]
        sector = addr.split('\n')[2]
        if prop_type == 'Land for sale' or prop_type == 'Lot for sale':
            bedrooms = 'NA'
            bathrooms = 'NA'
        else:
            bedrooms = listing.find_element(By.XPATH, ".//div[@class='cac']").text
            bathrooms = listing.find_element(By.XPATH, ".//div[@class='sdb']").text

        listing_item = {
            'mls':mls,
            'price': price,
            'Address': addr,
            'property Type': prop_type,
            'city': city,
            'bedrooms': bedrooms,
            'bathrooms': bathrooms,
            'sector': sector

        }

        centris_list.append(listing_item)

if __name__ == '__main__':
    chrome_options = Options()
    chrome_options.add_experimental_option("detach", True)
    #chrome_options.add_argument("headless")

    driver = webdriver.Chrome(ChromeDriverManager().install(), options=chrome_options)


    centris_list=[]

    driver.get(url) 

    total_pages = driver.find_element(By.CLASS_NAME,'pager-current').text.split('/')[1].strip() 
    
    for i in range(1,int(total_pages)):

        scrap_pages(driver)
        driver.find_element(By.CSS_SELECTOR,'li.next> a').click()
        time.sleep(3)
        if len(driver.find_elements(By.XPATH, ".//div[@class='DialogInsightLightBoxCloseButton']")) > 0:
            driver.find_element(By.XPATH, ".//div[@class='DialogInsightLightBoxCloseButton']").click()
            time.sleep(0.6)
            print('found subscription box')

Solution

There are some ways to disable pop-ups in chrome but they are rarely work. You can search for disabling pop-ups with chrome options but i doubt anything will help.
I can just suggest more elegant solution:

scrap_pages(driver)
driver.find_element(By.CSS_SELECTOR,'li.next> a').click()
time.sleep(3)
try:
    driver.find_element(By.CSS_SELECTOR, 'div[class="DialogInsightLightBoxCloseButton"]').click()
    print('pop-up closed')
except (NoSuchElementException, ElementNotInteractableException):
    pass

for this to work you need to import error modules from selenium.common.exceptions import NoSuchElementException, ElementNotInteractableException

Another option is to surrond the whole 'scrap page, click next' with a try block. But in that case you will need to catch another error: ElementClickInterceptedException. Code will look like this:

try:
    scrap_pages(driver)
    driver.find_element(By.CSS_SELECTOR, 'li.next> a').click()
except ElementClickInterceptedException as initial_error:
    try:
        driver.find_element(By.CSS_SELECTOR, 'div[class="DialogInsightLightBoxCloseButton"]').click()
        print('pop-up closed')
        scrap_pages(driver)
        driver.find_element(By.CSS_SELECTOR, 'li.next> a').click()
    except NoSuchElementException:
        raise initial_error

But you see that in that case you need to use same lines

scrap_pages(driver)
driver.find_element(By.CSS_SELECTOR,'li.next> a').click()

twice (in try and in except). Moreover, this pop-up can appear after you finally click the link and this will prevent correct scraping. It seems that the first option is better.

Answered By - Eugeny Okulik

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, November 13, 2022

[FIXED] python selenium how to handle webpage randomly generates popups

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels