Issue
I've been trying to web scrape the links of the images from a site but I only ever manage to get the first link.
code:
from bs4 import BeautifulSoup
import requests
def getimg(link):
source = requests.get(link).text
soup = BeautifulSoup(source, 'lxml')
imglist = soup.find_all('div', class_='container-chapter-reader')
for links in imglist:
imglink = links.find('img').get('src')
print(imglink)
getimg('https://manganelo.com/chapter/kimetsu_no_yaiba/chapter_1')
output:
https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/1.jpg
a short snippet of the HTML:
<div class="container-chapter-reader">
<img src="https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/1.jpg" alt="Kimetsu no Yaiba Chapter 1 : Cruelty page 1 - MangaNelo.com" title="Kimetsu no Yaiba Chapter 1 : Cruelty page 1 - MangaNelo.com" />
<img src="https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/2.jpg" alt="Kimetsu no Yaiba Chapter 1 : Cruelty page 2 - MangaNelo.com" title="Kimetsu no Yaiba Chapter 1 : Cruelty page 2 - MangaNelo.com" />
<img src="https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/3.jpg" alt="Kimetsu no Yaiba Chapter 1 : Cruelty page 3 - MangaNelo.com" title="Kimetsu no Yaiba Chapter 1 : Cruelty page 3 - MangaNelo.com" />
<img src="https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/4.jpg" alt="Kimetsu no Yaiba Chapter 1 : Cruelty page 4 - MangaNelo.com" title="Kimetsu no Yaiba Chapter 1 : Cruelty page 4 - MangaNelo.com" />
<img src="https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/5.jpg" alt="Kimetsu no Yaiba Chapter 1 : Cruelty page 5 - MangaNelo.com" title="Kimetsu no Yaiba Chapter 1 : Cruelty page 5 - MangaNelo.com" />
This continues for as many times as there are images.
Solution
You can use this example how to return list of images from this page:
import requests
from bs4 import BeautifulSoup
def getimg(link):
source = requests.get(link).text
soup = BeautifulSoup(source, "lxml")
rv = []
for img in soup.select(".container-chapter-reader > img"):
rv.append(img["src"])
return rv
images = getimg("https://manganelo.com/chapter/kimetsu_no_yaiba/chapter_1")
print(*images, sep="\n")
Prints:
https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/1.jpg
https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/2.jpg
https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/3.jpg
https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/4.jpg
https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/5.jpg
https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/6.jpg
https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/7.jpg
https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/8.jpg
https://s9.mkklcdnv6tempv4.com/mangakakalot/k1/kimetsu_no_yaiba/chapter_1_cruelty/9.jpg
...and so on.
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.