Wednesday, December 13, 2023

[FIXED] Why is BeautifulSoup returning None when scraping google search results?

December 13, 2023 beautifulsoup, python, web-scraping No comments

Issue

I'm trying to use BeautifulSoup to find the birth years of different authors. I'm working in VS Code, if that's relevant. This is my first attempt at web scraping so please explain things as clearly as possible

For authors with wikipedia pages, I can successully find birth years using the following code:

source_code = requests.get("a_wikipedia_url")
plain_text = source_code.text
soup = BeautifulSoup(plain_text, features="html.parser")
finder = soup.find("span", {"class": "bday"})
if finder is not None:
        birth_year = finder.string[0:4]
        return birth_year

However when I try the same thing with google search for authors with no (English) wikipedia page, I just get None.

After reading this question https://stackoverflow.com/questions/62466340/cant-scrape-google-search-results-with-beautifulsoup I added a User Agent response header to requests.get (I'm using Chrome Version 114.0.5735.134 (Official Build) (64-bit) and Windows 11 Home), but all it did was print None instead of giving my AttributeError: 'NoneType' object has no attribute 'string', which is what I was getting before adding the header.

This is my code:

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5735.134 Safari/537.36"}
source_code = requests.get("https://www.google.com/search?q=Guillermo+Saccomanno", headers=headers)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, features="html.parser")
google_finder = soup.find("span", {"class": "LrzXr kno-fv wHYlTd z8gr9e"})
print(google_finder.string)

The result is just None - no error message, but no text.

I also tried with the header Chrome version as Chrome/114.0.0.0, which is what I found online. Still gives None.

I'm not sure where I'm going wrong as the syntax is identical and I copied the class name from the page source? For this particular author, I would expect google_finder.string to be "9 June 1948 (age 75 years)".

Solution

If you want to parse the born date I'd chose different strategy: Find a <span> tag with text "Born:" and then next sibling. Also add hl=en parameter to URL to get english results:

import requests
from bs4 import BeautifulSoup

url = 'https://www.google.com/search?q=Guillermo+Saccomanno&hl=en'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/114.0'}

soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')

born = soup.select_one('span:-soup-contains("Born:") + span')
print(born.text)

Prints:

June 9, 1948 (age 75 years), Buenos Aires, Argentina

Answered By - Andrej Kesely

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, December 13, 2023

[FIXED] Why is BeautifulSoup returning None when scraping google search results?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels