Issue
I don't have a lot of experience in webscraping, just some weeks trying to get my code. I'm trying to get text in onclick attribute from tripadvisor restaurant, and it has been difficult.
This is the code html from the page
and this is my code:
with requests.Session() as s:
for offset in range (1,2):
url = f'https://www.tripadvisor.fr/Restaurant_Review-g187147-d17452512-Reviews or {offset}-Madame_Pop_s-Paris_Ile_de_France.html'
r = s.get(url)
soup = bs(r.content, 'lxml')
if not offset:
inf_rest_name = soup.select_one('.heading').text.replace("\n","").strip()
rest_eclf = soup.select_one('.header_links a').text.strip()
for review in soup.select('.reviewSelector'):
name_client = review.select_one('.info_text > div:first-child').text.strip()
date_rev_cl = review.select_one('.ratingDate')['title'].strip()
titre_rev_cl = review.select_one('.noQuotes').text.replace(",","").strip()
opinion_cl= review.select_one('.partial_entry').text.replace("\n","").strip()
for opplus in opinion_cl:
secondtag = opplus.select_one('span', {'onclick':'widgetEvCall('handlers.clickExpand',event,this);'})
row = [f"{inf_rest_name}", f"{rest_eclf}", f"{name_client}", f"{date_rev_cl}", f"{titre_rev_cl}", f"{opinion_cl}"]
w.writerow(row)
In the last part, the introduction of for opplus... shows me an error. I also tried to type on line 13 '.onclick' next to '.partial_entry', but it doesn't work. Can you tell me what I have to change? ... how can I do to get full text with python?... I will appreciate your suggestions.
Solution
So went to trip advisor site and saw that when you click "Plus" it sends post request to tripadvisor. Basically what you need to do is go to network and figure it out how site behaves.
Since I some spare time I decided to help you out.
with requests.Session() as s:
for offset in range (1,2):
url = f'https://www.tripadvisor.fr/Restaurant_Review-g187147-d17452512-Reviews or {offset}-Madame_Pop_s-Paris_Ile_de_France.html'
r = s.get(url)
soup = bs(r.content, 'lxml')
# Now the trick is that there is ajax that sends post request to https://www.tripadvisor.fr/OverlayWidgetAjax?Mode=EXPANDED_HOTEL_REVIEWS_RESP&metaReferer=
# The data that it sends contain review ids, plus you need to send in headers Referer
# First get the list of ids
reviews = soup.select('.reviewSelector')
ids = [review.get('data-reviewid') for review in reviews]
# Now send request
req = s.post(
'https://www.tripadvisor.fr/OverlayWidgetAjax?Mode=EXPANDED_HOTEL_REVIEWS_RESP&metaReferer=',
data={'reviews': ','.join(ids), 'contextChoice': 'DETAIL'},
headers = {'Referer': req.url}
)
# And now you can follow the logic that you had
soup = bs(req.content, 'lxml')
if not offset:
....
Answered By - puchal
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.