Wednesday, September 7, 2022

[FIXED] How to get src in an image using class?

September 07, 2022 beautifulsoup, css, html, python, web-scraping No comments

Issue

Hi I am trying to get the src data from the image on the website, I locate the image using the class since it is unique. With the code below it is able to locate the image but is unable to save the image to mongodb and shows up as null, so want to find the src and save the link instead.

ps. the code works for other classes but not sure how to locate the src and save it into "findImage".

https://myaeon2go.com/products/category/6236298/vegetable

postal code is : 56000


cate_list = [
    "https://myaeon2go.com/products/category/1208101/fresh-foods",
    "https://myaeon2go.com/products/category/8630656/ready-to-eat",
    "https://myaeon2go.com/products/category/6528959/grocery",
    "https://myaeon2go.com/products/category/6758871/snacks",
    "https://myaeon2go.com/products/category/8124135/chill-&-frozen",
    "https://myaeon2go.com/products/category/4995043/beverage",
    "https://myaeon2go.com/products/category/3405538/household",
    "https://myaeon2go.com/products/category/493239/baby-&-kids",
]


cookies = {
    "hideLocationOverlay": "true",
    "selectedShippingState": "Kuala Lumpur",
    "selectedPostalCode": "56000",
}

for x in range(len(cate_list)):

    url = cate_list[x]

    # convert soup to readable html
    result = requests.get(url, cookies=cookies)
    doc = BeautifulSoup(result.text, "html.parser")

# a for loop located here to loop through all the products

                # <span class="n_MyDQk4X3P0XRRoTnOe a8H5VCTgYjZnRCen1YkC">myAEON2go Signature Taman Maluri</span>
                findImage = j.find("img", {"class": "pgJEkulRiYnxQNzO8njV shown"})

Solution

To extract the value of src attribute simply call .get('src') on your element.

Try to change your strategy selecting elements and avoid using classes that are often dynamically - I recommend to use more static identifier as well as HTML structure.

for url in cate_list:

    result = requests.get(url, cookies=cookies,headers = {'User-Agent': 'Mozilla/5.0'})
    doc = BeautifulSoup(result.text, "html.parser")
    for e in doc.select('.g-product-list li'):
        print(e.img.get('src'))

Note: Iterating your list do not need range(len()) construct

Example

import requests
from bs4 import BeautifulSoup

cate_list = [
    "https://myaeon2go.com/products/category/1208101/fresh-foods",
    "https://myaeon2go.com/products/category/8630656/ready-to-eat",
    "https://myaeon2go.com/products/category/6528959/grocery",
    "https://myaeon2go.com/products/category/6758871/snacks",
    "https://myaeon2go.com/products/category/8124135/chill-&-frozen",
    "https://myaeon2go.com/products/category/4995043/beverage",
    "https://myaeon2go.com/products/category/3405538/household",
    "https://myaeon2go.com/products/category/493239/baby-&-kids",
]


cookies = {
    "hideLocationOverlay": "true",
    "selectedShippingState": "Kuala Lumpur",
    "selectedPostalCode": "56000",
}

for url in cate_list:

    result = requests.get(url, cookies=cookies,headers = {'User-Agent': 'Mozilla/5.0'})
    doc = BeautifulSoup(result.text, "html.parser")
    for e in doc.select('.g-product-list li'):
        print(e.img.get('src').split(')/')[-1])

Output

https://assets.myboxed.com.my/1659400060229.jpg
https://assets.myboxed.com.my/1662502067580.jpg
https://assets.myboxed.com.my/1658448744726.jpg
https://assets.myboxed.com.my/1627880003755.jpg
https://assets.myboxed.com.my/1662507451284.jpg
https://assets.myboxed.com.my/1662501936757.jpg
https://assets.myboxed.com.my/1659400602324.jpg
https://assets.myboxed.com.my/1627880346297.jpg
https://assets.myboxed.com.my/1662501743853.jpg
...

Answered By - HedgeHog

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, September 7, 2022

[FIXED] How to get src in an image using class?

Issue

Solution

Example

Output

0 comments:

Post a Comment

Popular Posts

Labels