Issue
I am writing a code to scrape the Amazon website for product prices.
I am trying to locate the last page of the website that still has the products available.
This is what the navigation page looks like: Navigation Panel
The last page is 8
which is disabled. My program goes through each page and extracts the product information. The program should stop after extracting from the 8th page. But when I tried to get the number 8
in the form of text, I am getting the ...
text.
I tried using bs4 to get the text. But the HTML code and other tags are the same for 8
and ...
<li class="a-disabled" aria-disabled="true">...</li>
<li class="a-disabled" aria-disabled="true">8</li>
So I tried using selenium's find_element_by_xpath
and convert it to text to find the maximum number of pages. But I am getting the NoSuchElementException
error and it is saying that it is not able to locate the XPath.
This is a part of my code to navigate to the next page and extract the product information:
def navigate_to_next_page():
try:
max_pages = driver.find_element_by_xpath("/html/body/div[1]/div[2]/div[1]/div/div[1]/div/span[3]/div[2]/div[20]/span/div/div/ul/li[6]").text
print(max_pages)
except NoSuchElementException:
print("Max Page Number Not Found")
for i in range(2,21):
next_page_url = get_search_product_url(driver, "samsung phones") + "&page=" + str(i)
driver.get(next_page_url)
results = extract_webpage_information()
records = record_product_information(results)
return records
Please ignore the for i in range(2,21):
. This is for testing purposes.
How can I get the maximum number of pages on a website if both the bs4 and selenium methods are not working?
Solution
I may have extracted the last page number from website and append it in the URL to go to every page one by one like - Please find the working code for the same.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get('https://www.amazon.in')
SearchTextbox = wait.until(EC.visibility_of_element_located((By.XPATH, "//input[@id=\"twotabsearchtextbox\"]")))
SearchTextbox.send_keys("pencil")
SearchTextbox.send_keys(Keys.ENTER)
GetLastPageNumber = wait.until(EC.presence_of_element_located(
(By.XPATH, "//li/a[text()=\"Next\"]/parent::li/preceding-sibling::li[contains(@aria-disabled,\"true\")][1]")))
print("Last Page Number is : " + GetLastPageNumber.text)
for i in range(int(GetLastPageNumber.text) + 1):
myurl = "https://www.amazon.in/s?k=pencil&page={0}&qid=1618567039&ref=sr_pg_2".format(str(i))
driver.get(myurl)
print("I'm done")
Note - Change the URL as per your country.
Mark as answer if it solve your problem.
Answered By - Swaroop Humane
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.