Issue
Hi I am trying to get the src data from the image on the website, I locate the image using the class since it is unique. With the code below it is able to locate the image but is unable to save the image to mongodb and shows up as null, so want to find the src and save the link instead.
ps. the code works for other classes but not sure how to locate the src and save it into "findImage".
https://myaeon2go.com/products/category/6236298/vegetable
postal code is : 56000
cate_list = [
"https://myaeon2go.com/products/category/1208101/fresh-foods",
"https://myaeon2go.com/products/category/8630656/ready-to-eat",
"https://myaeon2go.com/products/category/6528959/grocery",
"https://myaeon2go.com/products/category/6758871/snacks",
"https://myaeon2go.com/products/category/8124135/chill-&-frozen",
"https://myaeon2go.com/products/category/4995043/beverage",
"https://myaeon2go.com/products/category/3405538/household",
"https://myaeon2go.com/products/category/493239/baby-&-kids",
]
cookies = {
"hideLocationOverlay": "true",
"selectedShippingState": "Kuala Lumpur",
"selectedPostalCode": "56000",
}
for x in range(len(cate_list)):
url = cate_list[x]
# convert soup to readable html
result = requests.get(url, cookies=cookies)
doc = BeautifulSoup(result.text, "html.parser")
# a for loop located here to loop through all the products
# <span class="n_MyDQk4X3P0XRRoTnOe a8H5VCTgYjZnRCen1YkC">myAEON2go Signature Taman Maluri</span>
findImage = j.find("img", {"class": "pgJEkulRiYnxQNzO8njV shown"})
Solution
To extract the value of src
attribute simply call .get('src')
on your element.
Try to change your strategy selecting elements and avoid using classes that are often dynamically - I recommend to use more static identifier as well as HTML structure.
for url in cate_list:
result = requests.get(url, cookies=cookies,headers = {'User-Agent': 'Mozilla/5.0'})
doc = BeautifulSoup(result.text, "html.parser")
for e in doc.select('.g-product-list li'):
print(e.img.get('src'))
Note: Iterating your list
do not need range(len())
construct
Example
import requests
from bs4 import BeautifulSoup
cate_list = [
"https://myaeon2go.com/products/category/1208101/fresh-foods",
"https://myaeon2go.com/products/category/8630656/ready-to-eat",
"https://myaeon2go.com/products/category/6528959/grocery",
"https://myaeon2go.com/products/category/6758871/snacks",
"https://myaeon2go.com/products/category/8124135/chill-&-frozen",
"https://myaeon2go.com/products/category/4995043/beverage",
"https://myaeon2go.com/products/category/3405538/household",
"https://myaeon2go.com/products/category/493239/baby-&-kids",
]
cookies = {
"hideLocationOverlay": "true",
"selectedShippingState": "Kuala Lumpur",
"selectedPostalCode": "56000",
}
for url in cate_list:
result = requests.get(url, cookies=cookies,headers = {'User-Agent': 'Mozilla/5.0'})
doc = BeautifulSoup(result.text, "html.parser")
for e in doc.select('.g-product-list li'):
print(e.img.get('src').split(')/')[-1])
Output
https://assets.myboxed.com.my/1659400060229.jpg
https://assets.myboxed.com.my/1662502067580.jpg
https://assets.myboxed.com.my/1658448744726.jpg
https://assets.myboxed.com.my/1627880003755.jpg
https://assets.myboxed.com.my/1662507451284.jpg
https://assets.myboxed.com.my/1662501936757.jpg
https://assets.myboxed.com.my/1659400602324.jpg
https://assets.myboxed.com.my/1627880346297.jpg
https://assets.myboxed.com.my/1662501743853.jpg
...
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.