Issue
I am new to python and coding in general but trying to create a script to pull customer reviews from Trustpilot. I think I have something that works and tested it in Google Bard. I can get Bard to return results but when I run the same script on my Mac in PyCharm CE, it creates a .csv file with the right headers but no data.
I am sure I am missing something obvious. Why can Google Bard run the script and return results but when I run it on my machine I get just the headers in the csv file?
Any help would be much appreciated. I am getting no errors when I run it locally. I have python 3.12 installed and all the required modules.
Thanks.....Justin
from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import csv
import datetime
# Create a new Selenium webdriver instance
driver = webdriver.Chrome()
# Navigate to the given page
driver.get("https://uk.trustpilot.com/review/www.whsmith.co.uk")
# Wait for the page to load
driver.implicitly_wait(10)
# Get the HTML source code of the page
html = driver.page_source
# Create a BeautifulSoup object from the HTML source code
soup = BeautifulSoup(html, "html.parser")
# Extract all of the reviews from the page
reviews = soup.findAll("div", class_="review")
# Create a new CSV file to store the reviews
with open("whsmith_reviews.csv", "w", newline="") as f:
writer = csv.writer(f)
# Write the header row
writer.writerow(["Review Title", "Review Text", "Rating", "Review Date"])
# Iterate over the reviews and write them to the CSV file
for review in reviews:
title = review.find("h2", class_="review-title").text
text = review.find("p", class_="review-text").text
rating = review.find("span", class_="review-rating").text
date_str = review.find("span", class_="review-date").text
date = datetime.datetime.strptime(date_str, "%d %b %Y")
# Add the review to the CSV file
writer.writerow([title, text, rating, date])
# Close the Selenium webdriver instance
driver.quit()
Solution
The main issue is your selection for the reviews there is no such div
with class review
, may focus on the articles
:
soup.select('article'):
In newer code avoid old syntax findAll()
instead use find_all()
or select()
with css selectors
- For more take a minute to check docs
There is also no need for selenium in this case, just take a look:
from bs4 import BeautifulSoup
import requests, csv
data = []
from_page = 1
to_page = 5
for i in range(from_page, to_page + 1):
response = requests.get(f"https://uk.trustpilot.com/review/www.whsmith.co.uk")
web_page = response.text
soup = BeautifulSoup(web_page, "html.parser")
for e in soup.select('article'):
data.append({
'review_title':e.h2.text,
'review_date_original': e.select_one('[data-service-review-date-of-experience-typography]').text.split(': ')[-1],
'review_rating':e.select_one('[data-service-review-rating] img').get('alt'),
'review_text': e.select_one('[data-service-review-text-typography]').text if e.select_one('[data-service-review-text-typography]') else None,
'page_number':i
})
with open('zzz_my_result.csv', 'w', newline='') as output_file:
dict_writer = csv.DictWriter(output_file, data[0].keys())
dict_writer.writeheader()
dict_writer.writerows(data)
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.