Sunday, October 17, 2021

[FIXED] Scraping website for image path(not downloading the image just getting clickable link) but image url is parsed in scraped text

October 17, 2021 beautifulsoup, python, python-requests, web-scraping No comments

Issue

I'm trying to scrape this website for the image URLs but when scraped this is the output while the image url which was previously visible in chrome's inspect element is no longer available as seen in the block of html text below

<div class="productImage" data-qa-id="productImagePLP_Running Low Top Sneaker Black/Rose Gold "><div class="sc-1xjgu8-0 jRkpWF"><div class="sc-1xjgu8-1 gCPKVp"><svg fill="none" height="22" viewbox="0 0 22 22" width="22" xmlns="http://www.w3.org/2000/svg"><path d="M14.2113 0.741972C13.3401 0.393483 12.3994 0.219238 11.4583 0.219238C10.901 0.219238 10.3433 0.289037 9.78569 0.393483L7.53809 4.26151C8.46153 3.75635 9.48942 3.51231 10.5525 3.52989C11.197 3.52989 11.8244 3.617 12.4343 3.79125L14.2113 0.741972Z" fill="#B2B8CA"></path><path d="M0.708008 11.1439C0.708008 16.7197 5.44726 21.0582 10.9706 21.0582C16.7556 21.0582 21.425 16.3885 21.425 10.7085C21.425 7.38056 19.8222 4.4533 17.435 2.50171L15.6925 5.51608C17.2258 6.82292 18.1146 8.73961 18.1146 10.7607C18.1146 14.6288 14.9084 17.7998 10.9706 17.7998C7.03278 17.7998 3.84441 14.6115 3.84441 10.6736C3.84441 10.6736 3.84441 10.6736 3.84441 10.6561C3.84441 10.1858 3.87906 9.71528 
3.96618 9.26209L0.708008 11.1439Z" fill="#B2B8CA"></path></svg>

chrome's inspect element

<img width="100%" height="100%" src="https://z.nooncdn.com/products/tr:n-t_240/v1603717104/N41330370V_2.jpg" alt="Running Low Top Sneaker Black/Rose Gold ">

I'm trying to scrape the src attribute. Is there a way to get around this? I've tried to form the URL myself using other attributes but that did not work. ill add relevant code and website link below

code:

    page = requests.get(URL, headers=header)
    soup = BeautifulSoup(page.content, 'html.parser')
    divs = soup.find_all('div', class_="productContainer")
    print(divs[0])

website link: https://www.noon.com/egypt-en/search?q=shoes

Solution

The page is loaded dynamically, therefore request doesn't support it. However, the data is available in JSON format on the page, which you can extract using the built-in json module.

import json
import requests
from bs4 import BeautifulSoup


URL = "https://www.noon.com/egypt-en/search?q=shoes"
headers = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36"
}

soup = BeautifulSoup(requests.get(URL, headers=headers).content, "html.parser")
json_data = json.loads(soup.find("script", {"id": "__NEXT_DATA__"}).string)

for data in json_data["props"]["pageProps"]["catalog"]["hits"]:
    price = data["sale_price"] or data["price"]
    print(data["name"])
    print(price)
    print("-" * 80)

Output:

Running Low Top Sneaker Black/Rose Gold 
1247
--------------------------------------------------------------------------------
Asweemove Running Shoes Black/White 
1076
--------------------------------------------------------------------------------
Leather Half Boots Dark Blue 
250
--------------------------------------------------------------------------------
...
...

Answered By - MendelG

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, October 17, 2021

[FIXED] Scraping website for image path(not downloading the image just getting clickable link) but image url is parsed in scraped text

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels