Issue
I've tried many things to return the data on this page: https://www.hebban.nl/rank . For some reason it's not returning any data point, after many tries.
Can someone point me in the right direction?
Below is a quick example to get (in theory the author and title on the page).
from bs4 import BeautifulSoup
# Send a GET request to the URL
url = "https://www.hebban.nl/rank"
response = requests.get(url)
# Parse the HTML content using Beautiful Soup
soup = BeautifulSoup(response.content, 'html.parser')
# Find the book titles, authors, and image url links
books = soup.find_all('div', class_='row-fluid')
for book in books:
title = book.find('a', class_='neutral').text.strip()
author = book.find('span', class_='author').text.strip()
print(title + ' by ' + author)
print('Image URL: ' + img_url)
Solution
Always and first of all, take a look at your soup to see if all the expected ingredients are in place. - Simply print your response / soup
You have to set a user-agent
to your request to avoid a first block by the server and get beautifulsoup
find something you are looking for:
response = requests.get(url,headers={'user-agent':'some agent'})
Also take a closer look to your selections and note that you have to select the #1 book separatly, because it will not fit the same selection.
Example
from bs4 import BeautifulSoup
# Send a GET request to the URL
url = "https://www.hebban.nl/rank"
response = requests.get(url,headers={'user-agent':'some agent'})
# Parse the HTML content using Beautiful Soup
soup = BeautifulSoup(response.content, 'html.parser')
# Find the book titles, authors, and image url links
books = soup.find_all('div', class_='item')
for book in books:
title = book.h3.text.strip()
author = book.find('span', class_='author').text.strip()
img_url = book.img.get('data-src')
print(title + ' by ' + author)
print('Image URL: ' + img_url)
Output
Output exceeds the size limit. Open the full output data in a text editor
2 by Raoul de Jong
Image URL: https://static.hebban.nl/covers/00001122/thumb/DEF%20omslag%20-%20Boto%20Banja.png
3 by Freida McFadden
Image URL: https://static.hebban.nl/covers/00001063/thumb/9789032520267.jpg
4 by Helen Fields
Image URL: https://static.hebban.nl/covers/00001062/thumb/9789026360787.jpg
5 by Thomas Olde Heuvelt
Image URL: https://static.hebban.nl/covers/00001032/thumb/9789022591116.jpeg
...
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.