Issue
I am having trouble with getting numerical values from span tags with the same class
This is what html looks like
<ul class="sds-definition-list review-breakdown--list">
<li>
<span class="sds-definition-list__display-name">Comfort</span>
<span class="sds-definition-list__value">5.0</span>
</li>
<li>
<span class="sds-definition-list__display-name">Interior design</span>
<span class="sds-definition-list__value">4.0</span>
</li>
<li>
<span class="sds-definition-list__display-name">Performance</span>
<span class="sds-definition-list__value">5.0</span>
</li>
<li>
<span class="sds-definition-list__display-name">Value for the money</span>
<span class="sds-definition-list__value">5.0</span>
</li>
<li>
<span class="sds-definition-list__display-name">Exterior styling</span>
<span class="sds-definition-list__value">5.0</span>
</li>
<li>
<span class="sds-definition-list__display-name">Reliability</span>
<span class="sds-definition-list__value">5.0</span>
</li>
</ul>
I basically want to take all the numerical values and put them in different columns, here is what I am using for my code
ua = UserAgent()
header = {'User-Agent':str(ua.safari)}
url = 'https://www.cars.com/research/nissan-leaf-2011/consumer-reviews/?page=1'
response = requests.get(url, headers=header)
print(response)
html_soup = BeautifulSoup(response.text, 'lxml')
content_list = html_soup.find_all('div', attrs={'class': 'consumer-review-container'})
data = []
for e in content_list:
data.append({
'review_title': e.h3.text,
'review_content': e.select_one('p.review-body').text,
'overall_rating': e.select_one('span.sds-rating__count').text,
'reviewer_name':e.select_one("div.review-byline div:nth-of-type(2)").text,
'review_date':e.find("div", {"class":"review-byline"}).div.text,
})
To the list data I would like to add information about: Comfort, Interior, Performance, Value for the money, Exterior styling and Reliability and this information I would like to get from the previously mentioned html code.
Solution
To get the result you could iterate over the <li>
and extract the contents with .stripped_strings
in a dict comprehension
then update your existing dict
and append it to data
.
Creating a DataFrame
this will create separate columns for each item:
for e in content_list:
d = {
'review_title': e.h3.text,
'review_content': e.select_one('p.review-body').text,
'overall_rating': e.select_one('span.sds-rating__count').text,
'reviewer_name':e.select_one("div.review-byline div:nth-of-type(2)").text,
'review_date':e.find("div", {"class":"review-byline"}).div.text,
}
d.update(dict(s.stripped_strings for s in e.select('ul.sds-definition-list li')))
data.append(d)
data
Output:
[{'review_title': 'Great Electric Car!',
'review_content': 'This is the perfect electric car for driving around town, doing errands or even for a short daily commuter. It is very comfy and very quick. The only issue was the first gen battery. The 2011-2014 battery degraded quickly and if the owner did not have Nissan replace it, all those cars are now junk and can only go 20 miles or so on a charge. We had Nissan replace our battery with the 2nd gen battery and it is good as new!',
'overall_rating': '4.7',
'reviewer_name': 'By EVs are the future from Tucson, AZ',
'review_date': 'February 24, 2020',
'Comfort': '5.0',
'Interior design': '5.0',
'Performance': '5.0',
'Value for the money': '5.0',
'Exterior styling': '3.0',
'Reliability': '5.0'},...]
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.