Tuesday, November 30, 2021

[FIXED] Finding the last page in a website using Python

November 30, 2021 amazon, beautifulsoup, python, selenium, web-scraping No comments

Issue

I am writing a code to scrape the Amazon website for product prices.

I am trying to locate the last page of the website that still has the products available.

This is what the navigation page looks like: Navigation Panel

The last page is 8 which is disabled. My program goes through each page and extracts the product information. The program should stop after extracting from the 8th page. But when I tried to get the number 8 in the form of text, I am getting the ... text.

I tried using bs4 to get the text. But the HTML code and other tags are the same for 8 and ...

<li class="a-disabled" aria-disabled="true">...</li>
<li class="a-disabled" aria-disabled="true">8</li>

So I tried using selenium's find_element_by_xpath and convert it to text to find the maximum number of pages. But I am getting the NoSuchElementException error and it is saying that it is not able to locate the XPath.

This is a part of my code to navigate to the next page and extract the product information:

def navigate_to_next_page():
    try:
        max_pages = driver.find_element_by_xpath("/html/body/div[1]/div[2]/div[1]/div/div[1]/div/span[3]/div[2]/div[20]/span/div/div/ul/li[6]").text
        print(max_pages)
    except NoSuchElementException:
        print("Max Page Number Not Found")
    for i in range(2,21):
        next_page_url = get_search_product_url(driver, "samsung phones") + "&page=" + str(i)
        driver.get(next_page_url)
        results = extract_webpage_information()
        records = record_product_information(results)
    return records

Please ignore the for i in range(2,21):. This is for testing purposes.

How can I get the maximum number of pages on a website if both the bs4 and selenium methods are not working?

Solution

I may have extracted the last page number from website and append it in the URL to go to every page one by one like - Please find the working code for the same.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)

driver.get('https://www.amazon.in')
SearchTextbox = wait.until(EC.visibility_of_element_located((By.XPATH, "//input[@id=\"twotabsearchtextbox\"]")))
SearchTextbox.send_keys("pencil")
SearchTextbox.send_keys(Keys.ENTER)

GetLastPageNumber = wait.until(EC.presence_of_element_located(
    (By.XPATH, "//li/a[text()=\"Next\"]/parent::li/preceding-sibling::li[contains(@aria-disabled,\"true\")][1]")))

print("Last Page Number is : " + GetLastPageNumber.text)

for i in range(int(GetLastPageNumber.text) + 1):
    myurl = "https://www.amazon.in/s?k=pencil&page={0}&qid=1618567039&ref=sr_pg_2".format(str(i))
    driver.get(myurl)

print("I'm done")

Note - Change the URL as per your country.

Mark as answer if it solve your problem.

Answered By - Swaroop Humane

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, November 30, 2021

[FIXED] Finding the last page in a website using Python

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels