Tuesday, February 1, 2022

[FIXED] BeautifulSoup extract multiple elements from a class

February 01, 2022 beautifulsoup No comments

Issue

I've been trying to web scrape the links of the images from a site but I only ever manage to get the first link.

code:

from bs4 import BeautifulSoup
import requests


def getimg(link):
    source = requests.get(link).text
    soup = BeautifulSoup(source, 'lxml')
    imglist = soup.find_all('div', class_='container-chapter-reader')
    for links in imglist:
         imglink = links.find('img').get('src')
         print(imglink)

getimg('https://manganelo.com/chapter/kimetsu_no_yaiba/chapter_1')

output:

https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/1.jpg

a short snippet of the HTML:

<div class="container-chapter-reader">
<img src="https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/1.jpg" alt="Kimetsu no Yaiba Chapter 1 : Cruelty page 1 - MangaNelo.com" title="Kimetsu no Yaiba Chapter 1 : Cruelty page 1 - MangaNelo.com" />
<img src="https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/2.jpg" alt="Kimetsu no Yaiba Chapter 1 : Cruelty page 2 - MangaNelo.com" title="Kimetsu no Yaiba Chapter 1 : Cruelty page 2 - MangaNelo.com" />
<img src="https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/3.jpg" alt="Kimetsu no Yaiba Chapter 1 : Cruelty page 3 - MangaNelo.com" title="Kimetsu no Yaiba Chapter 1 : Cruelty page 3 - MangaNelo.com" />
<img src="https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/4.jpg" alt="Kimetsu no Yaiba Chapter 1 : Cruelty page 4 - MangaNelo.com" title="Kimetsu no Yaiba Chapter 1 : Cruelty page 4 - MangaNelo.com" />
<img src="https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/5.jpg" alt="Kimetsu no Yaiba Chapter 1 : Cruelty page 5 - MangaNelo.com" title="Kimetsu no Yaiba Chapter 1 : Cruelty page 5 - MangaNelo.com" />

This continues for as many times as there are images.

Solution

You can use this example how to return list of images from this page:

import requests
from bs4 import BeautifulSoup


def getimg(link):
    source = requests.get(link).text
    soup = BeautifulSoup(source, "lxml")

    rv = []
    for img in soup.select(".container-chapter-reader > img"):
        rv.append(img["src"])

    return rv


images = getimg("https://manganelo.com/chapter/kimetsu_no_yaiba/chapter_1")
print(*images, sep="\n")

Prints:

https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/1.jpg
https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/2.jpg
https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/3.jpg
https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/4.jpg
https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/5.jpg
https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/6.jpg
https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/7.jpg
https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/8.jpg
https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/9.jpg

...and so on.

Answered By - Andrej Kesely

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, February 1, 2022

[FIXED] BeautifulSoup extract multiple elements from a class

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels