Monday, January 15, 2024

[FIXED] Scraper doesn't scrape prices from Amazon in the same order

January 15, 2024 beautifulsoup, python, web-scraping No comments

Issue

I am scraping the prices of the laptops on Amazon on the first page. It scrapes all of the prices but they are not in same order as they are on the web. What could be the problem?

Here is my code where you can also find the link to the page:

import requests
from bs4 import BeautifulSoup

URL = "https://www.amazon.com/s?k=laptop&crid=288NMI7Z5E2WR&sprefix=laptop%2Caps%2C572&ref=nb_sb_noss_1"

header = {
  "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36",
  "Accept-Language": "hr-HR,hr;q=0.9,en-US;q=0.8,en;q=0.7"
}

response = requests.get(URL, headers=header)

web_page = response.text

soup = BeautifulSoup(web_page, 'lxml')

boxs = soup.find_all("div", class_="puisg-col puisg-col-4-of-12 puisg-col-8-of-16 puisg-col-12-of-20 puisg-col-12-of-24 puis-list-col-right")

for box in boxs:
    name = box.find("span", class_="a-size-medium a-color-base a-text-normal").getText()
    price = box.find("span", class_="a-offscreen").getText()
    print(price)

And here is the snap shot of the prices that I get:

Solution

Websites like that use loads of algorithms to sort products, based on things like previous behavior on the site, location, search history, etc (dynamic content personalization). When you scrape data from such a website using a script, you are treated as a different "user" each time you send a request.

But you can apply sessions with cookies that simulate more persistent user behavior. Keep in mind that even with a session though, if the shops algorithm decides to shuffle the product listings, the order of the results you scrape still do not match what you see in your browser. A session can make it more consistent, but most probably not completely consistent.

You can try it out:

import requests
from bs4 import BeautifulSoup

# Create a session object
s = requests.Session()

# Set headers
s.headers.update({
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9"
})

URL = "https://www.amazon.com/s?k=laptop&crid=288NMI7Z5E2WR&sprefix=laptop%2Caps%2C572&ref=nb_sb_noss_1"

# Use the session to get the page content
response = s.get(URL)

web_page = response.text

soup = BeautifulSoup(web_page, 'lxml')

# Define the correct class or tag structure to find the elements you want
# This is just an example and might not match Amazon's current page structure
boxes = soup.find_all("div", {"class": "YOUR-CSS-CLASS-FOR-ITEM-CONTAINER"})

for box in boxes:
    name = box.find("span", {"class": "a-size-medium a-color-base a-text-normal"}).getText()
    price = box.find("span", {"class": "a-offscreen"}).getText()
    print(price)

Answered By - Miles

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, January 15, 2024

[FIXED] Scraper doesn't scrape prices from Amazon in the same order

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels