Issue
I have this code to scrape review text from IMDB. I want to retrieve the entire text from the review, but it skips every time there is a new line, for example:
Saw an early screening tonight in Denver.
I don't know where to begin. So I will start at the weakest link. The acting. Still great, but any passable actor could have been given any of the major roles and done a great job.
The code will only retrieve
Saw an early screening tonight in Denver.
Here is my code:
reviews = driver.find_elements(By.CSS_SELECTOR, 'div.review-container')
first_review = reviews[0]
sel2 = Selector(text = first_review.get_attribute('innerHTML'))
rating_list = []
review_date_list = []
review_title_list = []
author_list = []
review_list = []
error_url_list = []
error_msg_list = []
reviews = driver.find_elements(By.CSS_SELECTOR, 'div.review-container')
for d in tqdm(reviews):
try:
sel2 = Selector(text = d.get_attribute('innerHTML'))
try:
rating = sel2.css('.rating-other-user-rating span::text').extract_first()
except:
rating = np.NaN
try:
review = sel2.css('.text.show-more__control::text').get()
except:
review = np.NaN
try:
review_date = sel2.css('.review-date::text').extract_first()
except:
review_date = np.NaN
try:
author = sel2.css('.display-name-link a::text').extract_first()
except:
author = np.NaN
try:
review_title = sel2.css('a.title::text').extract_first()
except:
review_title = np.NaN
rating_list.append(rating)
review_date_list.append(review_date)
review_title_list.append(review_title)
author_list.append(author)
review_list.append(review)
except Exception as e:
error_url_list.append(url)
error_msg_list.append(e)
review_df = pd.DataFrame({
'review_date':review_date_list,
'author':author_list,
'rating':rating_list,
'review_title':review_title_list,
'review':review_list
})
Solution
Use .extract()
instead of .get()
to extract all texts in the type of list
. Then, you can use .join()
to concatenate all texts into a single string.
review = sel2.css('.text.show-more__control::text').extract()
review = ' '.join(review)
output:
'For a teenager today, Dunkirk must seem even more distant than the Boer War did to my generation growing up just after WW2. For some, Christopher Nolan's film may be the most they will know about the event. But it's enough in some ways because even if it doesn't show everything that happened, maybe it goes as close as a film could to letting you know how it felt. "Dunkirk" focuses on a number of characters who are inside the event, living it ....'
Answered By - JayPeerachai
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.