Issue
I'm practicing to extract some information via web scraping from website https://www.kerastase.com.au/ . As an example, I'm focusing on Best Seller items (7 items). I have been able to extract name, description and price using the following code.
import requests
from bs4 import BeautifulSoup
url='https://www.kerastase.com.au/'
response = requests.get(url)
soup = BeautifulSoup(response.content, "lxml")
prod_names = soup.find_all("h3", class_="c-product-tile__name")
prod_names = [prod.get_text() for prod in prod_names]
prices = soup.find_all("span", class_="c-product-price__value")
prices = [float(price.get_text()[2:]) for price in prices if (len(price) > 0)]
prod_descs = soup.find_all("p", class_="c-product-tile__description")
prod_descs = [desc.get_text() for desc in prod_descs]
However, extracting rating and number of reviews seem to be more complicated. It is a nested div. I have been able to extract caption of the first item using the following command; however it is a mess, and don't know what to do after this step:
soup.findAll('figcaption', class_="c-product-tile__caption")[0]
Here is an example of full caption of one item I get:
<figcaption class="c-product-tile__caption"> <div class="c-product-tile__caption-inner"> <div class="c-product-tile__wishlist"> <button aria-label="Add to Wishlist Elixir Ultime Pride Edition Hair Oil" aria-pressed="" class="c-add-to-wishlist" data-analytics='{"products":[{"pid":"3474637116088","title":"Elixir Ultime Pride Edition Hair Oil","description":"","url":"https://www.kerastase.com.au/collections/elixir-ultime/elixir-ultime-pride-edition-hair-oil/3474637116088.html","imgUrl":"https://www.kerastase.com.au/on/demandware.static/-/Sites-kerastase-master-catalog/default/dw377882d1/2022/Elixir%20Ultime/Pride/1.%20Product.jpg","currency":"AUD","price":65,"name":"Elixir Ultime Pride Edition Hair Oil","subname":"Iconic nourishing hair oil for all hair types. Kérastase will be donating to Minus18, subsidising LGBTQIA+ Inclusion Workshops for schools across Australia.","id":"elixir-pride","salePrice":65,"brand":"Kérastase","category":"others/collections/elixir ultime","productTopCategory":"products","variant":"100 ml","size":"100 ml","color":"","fragrance":"","stock":"in stock","autoReplenishmentInterval":"not present","upc":"3474637116088","regularPrice":null,"isProductSet":false,"isProductGroup":false,"isBundle":false,"bundleID":"","rating":5,"numberReviews":2,"vtoState":"not present","collection":["Elixir Ultime"],"customizations":{"engraving":"not present"},"badges":"none","remainingStock":null}],"label":"elixir ultime pride edition hair oil::3474637116088","category":"{{dataLayer.page.category}}"}' data-component="product/AddToWishlist" data-component-options='{"pid":"3474637116088","url":{"add":"https://www.kerastase.com.au/on/demandware.store/Sites-kerastase-au-ng-Site/en_AU/Wishlist-AddToWishList","remove":"https://www.kerastase.com.au/on/demandware.store/Sites-kerastase-au-ng-Site/en_AU/Wishlist-RemoveFromWishList"},"text":{"title":{"add":"Add to Wishlist","remove":"Remove from Wishlist"},"accessibility":{"addAriaLabel":"Add to Wishlist Elixir Ultime Pride Edition Hair Oil","removeAriaLabel":"Remove from Wishlist Elixir Ultime Pride Edition Hair Oil"}},"isLabel":false}' title="Add to Wishlist"> <span class="h-show-for-sr" data-js-wishlist-text="">Wishlist</span> </button> </div> <h3 class="c-product-tile__name"><a data-js-product-name="" data-lora-datalayer='{"products":{"3474637116088":{"name":"Elixir Ultime Pride Edition Hair Oil"}}}' href="/collections/elixir-ultime/elixir-ultime-pride-edition-hair-oil/elixir-pride.html"> Elixir Ultime Pride Edition Hair Oil </a></h3><p class="c-product-tile__description"> Iconic nourishing hair oil for all hair types. Kérastase will be donating to Minus18, subsidising LGBTQIA+ Inclusion Workshops for schools across Australia. </p> <div class="c-product-tile__info m-multiple-items"> <div class="c-product-tile__info-item c-product-tile__rating"> <div data-bv-productid="elixir-pride" data-bv-redirect-url="/collections/elixir-ultime/elixir-ultime-pride-edition-hair-oil/elixir-pride.html" data-bv-seo="false" data-bv-show="inline_rating" data-component="product/BazaarvoiceService"> </div> </div> <div class="c-product-tile__info-item c-product-tile__price"> <div class="c-product-price" data-component="product/ProductPrice" data-component-options='{"pid":"3474637116088","reloadData":{"configid":null},"dataModelId":"productprice"}'> <span class="c-product-price__label h-hidden" data-js-pricelabel="">Old price</span> <span class="c-product-price__value m-old h-hidden" data-js-standardprice=""></span> <span class="c-product-price__label h-hidden" data-js-pricelabel="">New price</span> <span class="c-product-price__value" data-js-saleprice="">A$65.00</span> </div> </div> </div> <div class="c-product-tile__variations-group"> <div class="c-product-tile__swatch-group"> </div> <div class="c-product-tile__variations"> <div class="c-product-tile__variations-label">One size available</div> <div class="c-product-tile__variations-single-text"> <span data-js-pid="">100 ml</span> </div> </div> </div> </div> <div class="c-product-tile__actions m-add-bag-enabled" data-js-producttile-actions=""> <div data-component="global/ComponentPlaceholder" data-component-options='{"_lazyload":true,"reloadData":{"id":"productmainaction","section":"product","configid":"producttile","reloadUrl":"https://www.kerastase.com.au/on/demandware.store/Sites-kerastase-au-ng-Site/en_AU/CDSLazyload-product_productmainaction?configid=producttile&data=3474637116088&id=productmainaction&pageId=homepage&section=product"}}'> <button class="c-button m-expand-for-medium-down c-product-add-bag__button m-loading"> <span>Loading ...</span> </button> </div> </div> </figcaption>
How can I get products rating and number of reviews from this? Example: "rating":5,"numberReviews":2
(It is probably possible to get all product info from the above, but don't know what the best method is).
Solution
If you find main specific tag for product details data is inside in button
tag and it contains json
formatted data so we can use data and find the relaticve information
main_tag=soup.find_all("div",class_="c-product-tile__figure")
import json
dict1={}
for i in range(len(main_tag)):
json_data=main_tag[i].find("button")['data-analytics']
details=json.loads(json_data)
price=details['products'][0]['price']
rating=details['products'][0]['rating']
numberReviews=details['products'][0]['numberReviews']
title=details['products'][0]['title']
dict1[i]={'name':title,'price':price,'rating':rating,'reviews':numberReviews}
Output:
{0: {'name': 'Elixir Ultime Pride Edition Hair Oil',
'price': 65,
'rating': 5,
'reviews': 2},
1: {'name': 'Nutritive 8HR Magic Night Hair Serum',
'price': 67,
'rating': 4.5701,
'reviews': 749},
....
}
Answered By - Bhavya Parikh
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.