Issue
I want to get all reviews of guests. But there are reviews which have long text that use to click 'read more' button to see all texts. The problem is the number of buttons is depend on reviews. I can load and get all reviews it's work fine but I have no idea with 'read more' button. How to click all 'read more' button in modal
Link URL: https://th.airbnb.com/rooms/27194960/reviews?source_impression_id=p3_1600195106_a%2FYGw9bddHf%2BMfUE
The code below is a function to get HTML text from URL. There are 2 conditions which we will focus on condition 2 which gets reviews you can see in if-else condition --- if review:.
def get_pageswithSelenium(roomid,review,page_send):
#session = requests.Session()
#ua = UserAgent()
#headers = {'User-Agent':ua.random}
if not(review):
url = "https://th.airbnb.com/rooms/{}?source_impression_id=p3_1600195106_a%2FYGw9bddHf%2BMfUE".format(roomid)
else:
url = "https://th.airbnb.com/rooms/{}/reviews?source_impression_id=p3_1600195106_a%2FYGw9bddHf%2BMfUE".format(roomid)
print("selenium url: "+url)
browser = webdriver.Chrome(executable_path=r"C:\chromedriver_win32\chromedriver.exe")
browser.get(url)
if review:
browser.implicitly_wait(20)
element_inside_popup = browser.find_element_by_xpath('//div[@class="_yzu7qn"]//a')
for j in range(page_send):
element_inside_popup.send_keys(Keys.END)
time.sleep(5)
print(str(j))
#find all 'read more' button and click (code here)
else:
browser.implicitly_wait(12)
html = browser.page_source
bsObj_bd = BeautifulSoup(html,'html')
return bsObj_bd
I use Selenium with a Chrome driver. Thanks, everyone to come and help in advance.
Solution
Check if the 'read more' button exists using wait
and try-except
.
See this link for more info about wait
.
#find all 'read more' button and click (code here)
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
try:
buttons = WebDriverWait(driver, 10).until(
EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='_yzu7qn']//button[@class='_ejra3kg']")))
except TimeoutException:
print("no read more")
else:
for button in buttons:
button.click()
Answered By - Peter Quan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.