Issue
I am making a webscrapper for Bookdepository and I came across a problem with the html elements of the site. The page for a book has a section called Product Details and I need to take each element from the list. However some of the elements, not all, like Language have this structure sample image. How is it possible to get this element?
My work in progress is this. Thanks a lot in advance
import bs4
from urllib.request import urlopen
book_isbn = ("9781399703994")
book_urls = "https://www.bookdepository.com/Enid-Blytons-Christmas-Tales-Enid-Blyton/" + book_isbn
source = urlopen(book_urls).read()
soup = bs4.BeautifulSoup(source,'lxml')
book_description = soup.find('div', class_='item-excerpt trunc')
book_title = soup.find('h1').text
book_info = soup.find('ul', class_='biblio-info')
book_pages = book_info.find('span', itemprop='numberOfPages').text
book_ibsn = book_info.find('span', itemprop='isbn').text
book_publication_date = book_info.find('span', itemprop='datePublished').text
book_publisher = book_info.find('span', itemprop='name').text
book_author = soup.find('span', itemprop="author").text
book_cover = soup.find('div', class_='item-img-content').img
book_language = book_info.find_next(string='Language',)
book_format = book_info.find_all(string='Format', )
print('Number of Pages: ' + book_pages.strip())
print('ISBN Number: ' + book_ibsn)
print('Publication Date: ' + book_publication_date)
print('Publisher Name: ' + book_publisher.strip())
print('Author: '+ book_author.strip())
print(book_cover)
print(book_language)
print(book_format)
Solution
To get the corresponding <span>
to your label you could go with:
book_info.find_next(string='Language').find_next('span').get_text(strip=True)
A more generic approach to get all these product details could be:
import bs4, re
from urllib.request import urlopen
book_isbn = ("9781399703994")
book_urls = "https://www.bookdepository.com/Enid-Blytons-Christmas-Tales-Enid-Blyton/" + book_isbn
source = urlopen(book_urls).read()
soup = bs4.BeautifulSoup(source,'lxml')
book = {
'description':soup.find('div', class_='item-excerpt trunc').get_text(strip=True),
'title':soup.find('h1').text
}
book.update({e.label.text.strip():re.sub('\s+', ' ',e.span.text).strip() for e in soup.select('.biblio-info li')})
book
Output:
{'description': "'A breathtaking memoir...I was so moved by this book.' Oprah'It is startlingly honest and, at times, a jaw-dropping read, charting her rise from poverty and abuse to becoming the first African-American to win the triple crown of an Oscar, Emmy and Tony for acting.' BBC NewsTHE DEEPLY PERSONAL, BRUTALLY HONEST ACCOUNT OF VIOLA'S INSPIRING LIFEIn my book, you will meet a little girl named Viola who ran from her past until she made a life changing decision to stop running forever.This is my story, from a crumbling apartment in Central Falls, Rhode Island, to the stage in New York City, and beyond. This is the path I took to finding my purpose and my strength, but also to finding my voice in a world that didn't always see me.As I wrote Finding Me, my eyes were open to the truth of how our stories are often not given close examination. They are bogarted, reinvented to fit into a crazy, competitive, judgmental world. So I wrote this for anyone who is searching for a way to understand and overcome a complicated past, let go of shame, and find acceptance. For anyone who needs reminding that a life worth living can only be born from radical honesty and the courage to shed facades and be...you.Finding Me is a deep reflection on my past and a promise for my future. My hope is that my story will inspire you to light up your own life with creative expression and rediscover who you were before the world put a label on you.show more",
'title': 'Finding Me : A Memoir - THE INSTANT SUNDAY TIMES BESTSELLER',
'Format': 'Hardback | 304 pages',
'Dimensions': '160 x 238 x 38mm | 520g',
'Publication date': '26 Apr 2022',
'Publisher': 'Hodder & Stoughton',
'Imprint': 'Coronet Books',
'Publication City/Country': 'London, United Kingdom',
'Language': 'English',
'ISBN10': '1399703994',
'ISBN13': '9781399703994',
'Bestsellers rank': '31'}
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.