Issue
i want to download all the pictures from this side in high resolution and not the preview pictures:
https://www.booklooker.de/B%C3%BCcher/Donna-W-Cross+Die-P%C3%A4pstin/id/A02A8f9001ZZl
The link -> https://xxxxx.de to the images i want to download is stored in this part of the html-code: link to the picture
The Code i tried so far was that:
from bs4 import BeautifulSoup
import requests
page = requests.get("https://www.booklooker.de/B%C3%BCcher/Donna-W-Cross+Die-P%C3%A4pstin/id/A02A8f9001ZZl")
souped = BeautifulSoup(page.content, "html.parser")
for pic in souped.find_all(class_="preview hasXXL"):
print(pic['href'])
With that i get to the right part of the code. But i don't get it how to scrape the link after the href-tag. When i want to scarpe it i get that results:
/app/detail.php?id=A02A8f9001ZZl&picNo=1" id="preview_1
But i expect that:
https://images.booklooker.de/x/02Sh07/Donna-W-Cross+Die-P%C3%A4pstin.jpg
What did i do wrong?
Thanks a lot for your help!!
Solution
If you want the image URLs (e.g. https://images.booklooker.de/t/02Sh07/Donna-W-Cross+Die-P%C3%A4pstin.jpg
) then you'd need to follow the previewImage elements in the HTML (not the "preview hasXXL" class) and extract the "src" attribute from the img element for the URL.
from bs4 import BeautifulSoup
import requests
url = "https://www.booklooker.de/B%C3%BCcher/Donna-W-Cross+Die-P%C3%A4pstin/id/A02A8f9001ZZl"
page = requests.get(url)
souped = BeautifulSoup(page.content, "html.parser")
for pic in souped.find_all("img", class_="previewImage"):
src = pic['src']
# Next change thumbnail URL to full res image
src = src.replace("/t/", "/s/")
print(src)
Output:
https://images.booklooker.de/s/02Sh07/Donna-W-Cross+Die-P%C3%A4pstin.jpg
...
https://images.booklooker.de/s/02Sh0S/Donna-W-Cross+Die-P%C3%A4pstin.jpg
Answered By - CodeMonkey
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.