Tuesday, June 14, 2022

[FIXED] No output while scraping Google search page

June 14, 2022 beautifulsoup, python No comments

Issue

I am trying to scrape from Google search results the blue highlighted portion as shown below:

When I use inspect element, it shows: span class="YhemCb". I have tried using various soup.find and soup.find_all commands, but everything I have tried has no output so far. What command should I use to scrape this part?

Solution

Google uses javascript to display most of its web elements, so using something like requests and BeautifulSoup is unfortunately not enough.

Instead, use selenium! It essentially allows you to control a browser using code.

First, you will need to navigate to the google page you wish to scrape

google_search = 'https://www.google.com/search?q=courtyard+by+marriott+fayetteville+fort+bragg'
driver.get(google_search)

Then, you have to wait until the review page loads in the browser.

This is done using WebDriverWait: you have to specify an element that needs to appear on the page. The [data-attrid="kc:/local:one line summary"] span css selector allows me to select the review info about the hotel.

timeout = 10
expectation = EC.presence_of_element_located((By.CSS_SELECTOR, '[data-attrid="kc:/local:one line summary"] span'))
review_element = WebDriverWait(driver, timeout).until(expectation)

And finally, print the rating

print(review_element.get_attribute('innerHTML'))

Here's the full code in case you want to play around with it

import chromedriver_autoinstaller
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

# setup selenium (I am using chrome here, so chrome has to be installed on your system)
chromedriver_autoinstaller.install()
options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)

# navigate to google
google_search = 'https://www.google.com/search?q=courtyard+by+marriott+fayetteville+fort+bragg'
driver.get(google_search)

# wait until the page loads
timeout = 10
expectation = EC.presence_of_element_located((By.CSS_SELECTOR, '[data-attrid="kc:/local:one line summary"] span'))
review_element = WebDriverWait(driver, timeout).until(expectation)

# print the rating
print(review_element.get_attribute('innerHTML'))

Note Google is notoriously defensive against anyone who is trying to scrape them. On first few attempts you might be successful, but eventually you will have to deal with Google Captcha.

To work around that, I would suggest using the search engine scraper, something like the quickstart guide to get you started!

Disclaimer: I work at Oxylabs.io

Answered By - Zyy

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, June 14, 2022

[FIXED] No output while scraping Google search page

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels