Tuesday, November 9, 2021

[FIXED] Sending GET requests to amazon.in but the webserver responded with response code 503, what to do?

November 09, 2021 beautifulsoup, python-3.x, python-requests, screen-scraping No comments

Issue

Here is my code:

This whole script worked fine for the first 2-3 times but now is constantly sending 503 responses

The Internet was checked by me multiple times but there wasn't any problem with internet

from bs4 import BeautifulSoup
import requests, sys, os, json

def get_amazon_search_page(search):
    search = search.strip().replace(" ", "+")
    for i in range(3): # tries to connect and get request the amazon 3 times
        try:
            print("Searching...")
            response = requests.get("https://www.amazon.in/s?k={}&ref=nb_sb_noss".format(search)) # search string will be manipulated by replacing all spaces with "+" in order to search from the website itself
            print(response.status_code)
            if response.status_code == 200:
                return response.content, search
        except Exception:
            pass
    print("Is the search valid for the site: https://www.amazon.in/s?k={}&ref=nb_sb_noss".format(search))
    sys.exit(1)

def get_items_from_page(page_content):
    print(page_content)
    soup = BeautifulSoup(page_content, "html.parser") # soup for extracting information
    items = soup.find_all("span", class_ = "a-size-medium a-color-base a-text-normal")
    prices = soup.find_all("span", class_ = "a-price-whole")
    item_list = []
    total_price_of_all = 0
    for item, price in zip(items, prices):
        dict = {}
        dict["Name"] = item.text
        dict["Price"] = int(price.text)
        total_price_of_all += int(price.text.replace(",", ""))
        item_list.append(dict)
    average_price = total_price_of_all/len(item_list)
    file = open("items.json", "w")
    json.dump(item_list, file, indent = 4)
    print("Your search results are available in the items.json file")
    print("Average prices for the search: {}".format(average_price))
    file.close()

def main():
    os.system("clear")
    print("Note: Sometimes amazon site misbehaves by sending 503 responses, this can be due to heavy traffic on that site, please cooperate\n\n")
    search = input("Enter product name: ").strip()
    page_content = get_amazon_search_page(search)
    get_items_from_page(page_content)

if __name__ == "__main__":
    while True:
        main()

Please Help !

Solution

The server blocks you from scraping it. If you check the robots.txt, you can see that the link you are trying to request is disallowed: Disallow: */s?k=*&rh=n*p_*p_*p_

However, a simple way to bypass this blocking would be to change your User-Agent (see here). By default, requests sends something like this "python-requests/2.22.0". Changing it to something more browser-like would temporarily work.

Answered By - bogster

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, November 9, 2021

[FIXED] Sending GET requests to amazon.in but the webserver responded with response code 503, what to do?

Issue

This whole script worked fine for the first 2-3 times but now is constantly sending 503 responses

The Internet was checked by me multiple times but there wasn't any problem with internet

Solution

0 comments:

Post a Comment

Popular Posts

Labels