Issue
I`m trying to get the src link of img tags inside a website and print them in console
from bs4 import BeautifulSoup
import requests
r = requests.get("https://welovemanga.one/2777/92578/")
soup = BeautifulSoup(r.content, "html.parser")
thumbnail_elements = soup.find_all("img", class_ = "chapter-img")
for element in thumbnail_elements:
print(element['src'])
the images in this website has a class "chapter-img" with each having its own src link, which is what I want.
But when I run the code, it returns this for each image in the link:
How can I get only the src of the img tags insted of lazy_loading.gif ?
Solution
The actual image link is in the data-original
attribute but it has to be accessed via proxy, so they set the src
as the loading gif (that you kept getting) until they have fetched the image and can update src
. But you can form that link for fetching the actual page image by altering your code to:
from bs4 import BeautifulSoup
import requests
r = requests.get("https://welovemanga.one/2777/92578/")
soup = BeautifulSoup(r.content, "html.parser")
thumbnail_elements = soup.find_all("img", class_ = "chapter-img")
proxyRoot = 'https://welovekai.com/proxy.php?link='
for element in thumbnail_elements:
print(proxyRoot+''.join(element['data-original'].split()))
[There's some whitespaces breaking up the link, so split
ting and then join
ing cleans it up.]
Answered By - Driftr95
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.