Issue
I am trying to run this code to scrape reviews from the google play store - but I keep getting the following error:
DevTools listening on ws://127.0.0.1:53044/devtools/browser/9de3e58b-6384-4809-bf01-31d47a57879f
Traceback (most recent call last):
File "c:/Users/Emil/Documents/Guatrain_Reviews/guatrain_reviews.py", line 20, in <module>
Ptitle = driver.find_element_by_class_name('id-app-title').text.replace(' ','')
File "C:\Users\Emil\Miniconda3\envs\data_analysis\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 564, in find_element_by_class_name
return self.find_element(by=By.CLASS_NAME, value=name)
File "C:\Users\Emil\Miniconda3\envs\data_analysis\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
'value': value})['value']
File "C:\Users\Emil\Miniconda3\envs\data_analysis\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\Emil\Miniconda3\envs\data_analysis\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"class name","selector":"id-app-title"}
(Session info: chrome=71.0.3578.98)
(Driver info: chromedriver=2.46.628402 (536cd7adbad73a3783fdc2cab92ab2ba7ec361e1),platform=Windows NT 10.0.17134 x86_64)
I suspect it has something to do with the id-app-title in
Ptitle = driver.find_element_by_class_name('id-app-title').text.replace(' ','')
Could someone point out where I would find that Id for the app I am interested in OR help me identify where I am going wrong.
Thanks
EDIT
The final result I want needs to look something like this:
where for which ever app url I insert - it will extract the rating and reviews:
Thanks
Solution
That code is from 2016, so I'm assuming they changed the structure which is why there is no 'id-app-title' or anything from the original code. That's just my assumption.
There's a lot of work that still needs to be done with this code (like changing out the time.sleep for implicit waits by selenium, and quite frankly just to make it more robust, as I only was looking at this particular app review.EDIT SEE BELOW) It's really complex html with tons of nested div
and span
tags with no specific meaning associated with the attributes/ class, etc. So I had trouble pulling out each user review element.
But essentially, I was able to open the page with the browser, have it continue to scroll down until it can click "Show More", and just continue an x amount of times.
Once it does that, it iterates the span tags. Now I figured out every 10 span tags is relating to a single user. However if the app owner responds to a review, it offsets then by 2 so had to account for that.
I'm fairly newer to programming, so I apologize for messy code and inefficiency. I'm sure an expert would be able to provide a better solution, however, this can hopefully get you started or playing around:
#load webdriver function from selenium
from selenium import webdriver
from time import sleep
import bs4
import pandas as pd
import requests
from selenium.webdriver.common.keys import Keys
import time
# Change this number to get more or less reviews
# Current set of x=100 yielded 11,312 reviews
x = 100
link = "https://play.google.com/store/apps/details?id=uk.co.o2.android.myo2&hl=en_GB"
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get(link + '&showAllReviews=true')
num_clicks = 0
num_scrolls = 0
while num_clicks <= x and num_scrolls <= x*5:
try:
show_more = driver.find_element_by_xpath('//*[@id="fcxH9b"]/div[4]/c-wiz/div/div[2]/div/div[1]/div/div/div[1]/div[2]/div[2]/div/content/span')
show_more.click()
num_clicks += 1
except:
html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.END)
num_scrolls +=1
time.sleep(2)
soup = bs4.BeautifulSoup(driver.page_source, 'html.parser')
h2 = soup.find_all('h2')
results_df = pd.DataFrame()
for ele in h2:
if ele.text == 'Reviews':
c_wiz = ele.parent.parent.find_all('c-wiz')
for sibling in c_wiz[0].next_siblings:
try:
#print (sibling)
comment_shift = 0
spans = sibling.find_all('span')
for user_block in range(0,len(spans)):
i = user_block *10
name = spans[i+0+comment_shift].text
try:
rating = spans[i+1+comment_shift].div.next_element['aria-label']
rating = str(''.join(filter(str.isdigit, rating)))
except:
comment_shift += 2
continue
date = spans[i+2+comment_shift].text
review = spans[i+8+comment_shift].text
print ('Name: %s\nRating: %s\nDate: %s\nReview: %s\n' %(name, rating, date, review))
temp_df = pd.DataFrame([[date, rating, name, review]], columns = ['Date','Rating','User','Review'])
results_df = results_df.append(temp_df)
except:
continue
results_df = results_df.reset_index(drop=True)
results_df.to_csv('C:/reviews.csv', index=False)
driver.close()
Output:
print (results_df)
Date ... Review
0 31 January 2019 ... Was broken for pay as you go customers. Has no...
1 2 February 2019 ... o2 just won't be happy until their customer se...
2 1 February 2019 ... Excellent quality piece of kit
3 6 February 2019 ... Gud 😁
4 23 December 2018 ... Can't get into the app using correct log in de...
5 16 December 2018 ... The update is rubbish. I can't use MyO2 anymor...
6 6 December 2018 ... Stop logging me out with every update, they ad...
7 25 December 2018 ... cant use this app anymore. shame i use to use ...
8 16 December 2018 ... Started receiving texts from 02 immediately af...
9 10 January 2019 ... havent been with the network long nor have i u...
10 22 December 2018 ... update has killed this app. why do I have to p...
11 9 January 2019 ... This app is now unusable for pay as you go cus...
12 26 January 2019 ... Wouldn't it be nice to find an app that the de...
13 19 December 2018 ... wont let me log in now since the latest update...
14 13 January 2019 ... it was ok for a while wen u needed to put in y...
15 6 January 2019 ... from last update I can't login anymore. not ev...
16 24 January 2019 ... I'm having 2 change review again coz I can't g...
17 5 January 2019 ... Changed my rating for this down from five to o...
18 22 December 2018 ... no longer works for me. shame as it was useful...
19 31 January 2019 ... total waste of time since update. not able to ...
20 23 January 2019 ... Despite what the description states the curren...
21 24 December 2018 ... When it finally lets you log in it then says t...
22 17 January 2019 ... Update breaks it, can't log in, log in on webs...
23 5 January 2019 ... 02 what have you done to app cant log in chang...
24 30 November 2018 ... Simple easy to use and all info available of m...
25 30 November 2018 ... No longer works for pay and go customers so co...
26 8 December 2018 ... Will not log me in after downloading the lates...
27 15 January 2019 ... Unable to log on to the app since the update. ...
28 1 January 2019 ... Very easy to use. Keeps me up to date.
29 1 December 2018 ... Good app maybe it should be as colourful as th...
... ... ...
11282 12 February 2017 ... Just re installed this a on my new device. Ha...
11283 18 December 2016 ... Since updating this app on my Samsung S3 mini ...
11284 19 January 2017 ... Lately the app gives intermittent server error...
11285 7 December 2016 ... New update
11286 12 December 2016 ... O2 needs to put right fast
11287 12 February 2017 ... Although unlimited minutes/texts I would still...
11288 30 December 2016 ... Never works
11289 13 August 2017 ... I have a Samsung galaxy 7 and the o2 app is no...
11290 6 December 2016 ... Doesn't work anymore
11291 4 December 2016 ... Since the last update this app does not work f...
11292 3 December 2016 ... O2
11293 5 December 2016 ... Good app (when it opens)
11294 11 January 2017 ... Stopped working and when it does work...
11295 1 December 2016 ... Nothing but a blue screen. Not happy.
11296 2 December 2016 ... Worst app ever
11297 18 January 2017 ... It's easier than trying to keep track of my ac...
11298 16 February 2017 ... The new update only shows blue screen before t...
11299 15 January 2017 ... Mr Dimitrov
11300 8 February 2017 ... Code 4 error frequently
11301 4 January 2017 ... Won't work at all
11302 27 January 2017 ... O2 GURU , EXCELLENT, ESQISET , PHANOMAL, SE...
11303 15 February 2017 ... Works well enough.
11304 1 December 2016 ... Great app keeps you up to.date
11305 28 December 2016 ... My 02
11306 16 December 2016 ... This is a "APPY APP""
11307 22 November 2016 ... Doesn't work for business account. Only shows ...
11308 25 November 2016 ... Doesn't work anymore
11309 11 November 2016 ... The ap won't open its just a blue screen I've ...
11310 24 November 2016 ... Doesn't work
11311 12 November 2016 ... My 02
[11312 rows x 4 columns]
Edit:
I tried with a couple different links:
link = "https://play.google.com/store/apps/details?id=com.outfit7.mytalkingtom2"
link = "https://play.google.com/store/apps/details?id=com.ingeniooz.hercule"
and it appeared to work:
Output:
print (results_df)
Date ... Review
0 February 5, 2019 ... after update it is not workin before it was ev...
1 February 4, 2019 ... no word to describe simply 😍
2 February 6, 2019 ... I loved this game
3 February 6, 2019 ... it is very funny game and very nice game also
4 February 6, 2019 ... 😎
5 February 6, 2019 ... relaxing effect
6 February 6, 2019 ... this is a cool game
7 February 6, 2019 ... Good game
8 February 6, 2019 ... Beast
9 February 1, 2019 ... Love this game, it is so much better then the ...
10 February 1, 2019 ... The recent updates are epic. The blender and d...
11 February 1, 2019 ... i like this funny game because tom is jumping ...
12 February 2, 2019 ... tom 2 is a great game
13 February 3, 2019 ... Very very nice game
14 February 3, 2019 ... I like it very much
15 February 5, 2019 ... Nice and superb game.
16 February 2, 2019 ... Tom is a cutipie
17 February 2, 2019 ... it is so...... cute
18 February 2, 2019 ... tr ty0
19 February 2, 2019 ... so good
20 February 2, 2019 ... nice game
21 February 1, 2019 ... Nice game
22 February 3, 2019 ... i love this game
23 February 6, 2019 ... l love this game as it is fun and enjoyable to...
24 February 2, 2019 ... love it
25 February 5, 2019 ... it is so awesome 👍😍😊
26 February 2, 2019 ... Amazing
27 February 3, 2019 ... nice
28 February 6, 2019 ... good
29 January 30, 2019 ... Anish Biswa 3 to be a bit. I'm not a good idea...
... ... ...
1770 February 2, 2019 ... fun
1771 February 5, 2019 ... ect,
1772 February 6, 2019 ... tom. is so cute
1773 February 6, 2019 ... nice
1774 January 5, 2019 ... urguuhtr
1775 January 14, 2019 ... Very interesting game 👌😀😀
1776 January 10, 2019 ... It s very very very nice
1777 January 21, 2019 ... supab game😘😘😘😘
1778 January 16, 2019 ... it's too funny 🐹🐹🐹🐰🐰🐰
1779 January 20, 2019 ... wow Best game
1780 January 27, 2019 ... It's damn good
1781 January 28, 2019 ... this a good and supper game. very nice game. ,...
1782 February 4, 2019 ... i love this game very very very much
1783 January 5, 2019 ... super
1784 January 12, 2019 ... It's fun Lol
1785 January 16, 2019 ... it ,s so good
1786 January 23, 2019 ... fun game for kids....loved it
1787 January 27, 2019 ... It's so nice
1788 February 1, 2019 ... Nice The Baby games i like 😀😀😀😀
1789 January 29, 2019 ... it's funny and it's fun to play
1790 January 10, 2019 ... best game... so cute
1791 January 10, 2019 ... So Cute!
1792 January 24, 2019 ... i lv this game very nice game .....
1793 January 25, 2019 ... Its superb... I love this game... 😘
1794 January 27, 2019 ... It is best game ever played😀😀😀😁😁😁
1795 January 19, 2019 ... I love it!
1796 January 20, 2019 ... good game!
1797 January 16, 2019 ... i love this game 😍.
1798 January 25, 2019 ... It is a good game for kids.....
1799 January 31, 2019 ... my talking tom is fun😊😊😊
[1800 rows x 4 columns]
And
print (results_df)
Date ... Review
0 December 2, 2018 ... It's a very well-thought-out an all rounded ap...
1 January 1, 2019 ... L'application est superbe et hyper complète! B...
2 December 6, 2017 ... Great workout diary with statistics. Easy to u...
3 June 13, 2017 ... I love this app! I've tried so many others, bu...
4 March 28, 2017 ... Works great at what it does. You can add exerc...
5 March 21, 2017 ... Great
6 December 8, 2016 ... Has all I need to build & adjust my workouts
7 October 23, 2016 ... Goodish
8 September 23, 2016 ... Great app
9 July 18, 2016 ... Excellent
10 March 9, 2016 ... great app.
11 July 10, 2015 ... Amazing and easy to use
12 June 5, 2015 ... I dreamt of this app, Hercule made it. Best ap...
13 March 18, 2015 ... Really good, but...
[14 rows x 4 columns]
Answered By - chitown88
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.