Issue
I want to extract all the review details like name, date, and reviewed data etc, for the below website of the product blueair. https://www.costco.com/blueair-healthprotect-7410i-hepasilent-ultra-air-purifier-with-germshield.product.100750915.html looks like it's hidden and used javascript.
import requests
from bs4 import BeautifulSoup
from requests_html import HTMLSession
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
'AppleWebKit/537.36 (KHTML, like Gecko) '\
'Chrome/75.0.3770.80 Safari/537.36'}
URL = 'https://www.costco.com/blueair-healthprotect-7410i-hepasilent-ultra-air-purifier-with-germshield.product.100750915.html'
httpx = requests.get(URL, headers=headers)
# print(httpx.text)
soup = BeautifulSoup(httpx.content,'html.parser')
for data in soup.findAll('span', class_='bv-content-datetime-stamp'):
print(data)
Solution
User the API with limit to fetch all the reviews
import requests
import json
limit = 100
r = requests.get(f'https://api.bazaarvoice.com/data/batch.json?passkey=bai25xto36hkl5erybga10t99&apiversion=5.5&displaycode=2070_2_0-en_us&resource.q0=reviews&filter.q0=isratingsonly%3Aeq%3Afalse&filter.q0=productid%3Aeq%3A100750915&filter.q0=contentlocale%3Aeq%3Aen_CA%2Cen_US&sort.q0=relevancy%3Aa1&stats.q0=reviews&filteredstats.q0=reviews&include.q0=authors%2Cproducts%2Ccomments&filter_reviews.q0=contentlocale%3Aeq%3Aen_CA%2Cen_US&filter_reviewcomments.q0=contentlocale%3Aeq%3Aen_CA%2Cen_US&filter_comments.q0=contentlocale%3Aeq%3Aen_CA%2Cen_US&limit.q0={limit}&offset.q0=8&limit_comments.q0=3&callback=bv_351_53884')
comments = json.loads(r.text[13:-1])['BatchedResults']['q0']['Results']
print(comments[0]['ReviewText'])
This was a great purchase, beautifully packed, easy set-up, great app, sleek design, and very quiet. The color bar shows the air quality being processed by the unit.
While I signed up for the auto-ship subscription service based on the Bluair statement that the app will analyze the filter condition and send a new filter right when it's needed. However, after speaking with two Bluair employees, it seems that rather sending a new filter when needed, Bluair just sends a new filter every six months regardless of filter condition and use -- certainly not high tech!
These are the query parameters you can tune
passkey: bai25xto36hkl5erybga10t99
apiversion: 5.5
displaycode: 2070_2_0-en_us
resource.q0: reviews
filter.q0: isratingsonly:eq:false
filter.q0: productid:eq:100750915
filter.q0: contentlocale:eq:en_CA,en_US
sort.q0: relevancy:a1
stats.q0: reviews
filteredstats.q0: reviews
include.q0: authors,products,comments
filter_reviews.q0: contentlocale:eq:en_CA,en_US
filter_reviewcomments.q0: contentlocale:eq:en_CA,en_US
filter_comments.q0: contentlocale:eq:en_CA,en_US
limit.q0: 30
offset.q0: 38
limit_comments.q0: 3
callback: bv_351_54703
Answered By - Epsi95
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.