Issue
I'm trying to scrape this website for the image URLs but when scraped this is the output while the image url which was previously visible in chrome's inspect element is no longer available as seen in the block of html text below
<div class="productImage" data-qa-id="productImagePLP_Running Low Top Sneaker Black/Rose Gold "><div class="sc-1xjgu8-0 jRkpWF"><div class="sc-1xjgu8-1 gCPKVp"><svg fill="none" height="22" viewbox="0 0 22 22" width="22" xmlns="http://www.w3.org/2000/svg"><path d="M14.2113 0.741972C13.3401 0.393483 12.3994 0.219238 11.4583 0.219238C10.901 0.219238 10.3433 0.289037 9.78569 0.393483L7.53809 4.26151C8.46153 3.75635 9.48942 3.51231 10.5525 3.52989C11.197 3.52989 11.8244 3.617 12.4343 3.79125L14.2113 0.741972Z" fill="#B2B8CA"></path><path d="M0.708008 11.1439C0.708008 16.7197 5.44726 21.0582 10.9706 21.0582C16.7556 21.0582 21.425 16.3885 21.425 10.7085C21.425 7.38056 19.8222 4.4533 17.435 2.50171L15.6925 5.51608C17.2258 6.82292 18.1146 8.73961 18.1146 10.7607C18.1146 14.6288 14.9084 17.7998 10.9706 17.7998C7.03278 17.7998 3.84441 14.6115 3.84441 10.6736C3.84441 10.6736 3.84441 10.6736 3.84441 10.6561C3.84441 10.1858 3.87906 9.71528
3.96618 9.26209L0.708008 11.1439Z" fill="#B2B8CA"></path></svg>
chrome's inspect element
<img width="100%" height="100%" src="https://z.nooncdn.com/products/tr:n-t_240/v1603717104/N41330370V_2.jpg" alt="Running Low Top Sneaker Black/Rose Gold ">
I'm trying to scrape the src attribute. Is there a way to get around this? I've tried to form the URL myself using other attributes but that did not work. ill add relevant code and website link below
code:
page = requests.get(URL, headers=header)
soup = BeautifulSoup(page.content, 'html.parser')
divs = soup.find_all('div', class_="productContainer")
print(divs[0])
website link: https://www.noon.com/egypt-en/search?q=shoes
Solution
The page is loaded dynamically, therefore request
doesn't support it. However, the data is available in JSON format on the page, which you can extract using the built-in json
module.
import json
import requests
from bs4 import BeautifulSoup
URL = "https://www.noon.com/egypt-en/search?q=shoes"
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36"
}
soup = BeautifulSoup(requests.get(URL, headers=headers).content, "html.parser")
json_data = json.loads(soup.find("script", {"id": "__NEXT_DATA__"}).string)
for data in json_data["props"]["pageProps"]["catalog"]["hits"]:
price = data["sale_price"] or data["price"]
print(data["name"])
print(price)
print("-" * 80)
Output:
Running Low Top Sneaker Black/Rose Gold
1247
--------------------------------------------------------------------------------
Asweemove Running Shoes Black/White
1076
--------------------------------------------------------------------------------
Leather Half Boots Dark Blue
250
--------------------------------------------------------------------------------
...
...
Answered By - MendelG
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.