Issue
I am writing a web scraper to extract the number of results of searching in a google search which appears on the top left of the page of search results. I have written the code below but I do not understand why phrase_extract is None. I want to extract the phrase "About 12,010,000,000 results". which part I am making a mistake? may be parsing the HTML incorrectly?
import requests
from bs4 import BeautifulSoup
def pyGoogleSearch(word):
address='http://www.google.com/#q='
newword=address+word
#webbrowser.open(newword)
page=requests.get(newword)
soup = BeautifulSoup(page.content, 'html.parser')
phrase_extract=soup.find(id="resultStats")
print(phrase_extract)
pyGoogleSearch('world')
Solution
You're actually using the wrong url to query google's search engine. You should be using http://www.google.com/search?q=<query>
.
So it'd look like this:
def pyGoogleSearch(word):
address = 'http://www.google.com/search?q='
newword = address + word
page = requests.get(newword)
soup = BeautifulSoup(page.content, 'html.parser')
phrase_extract = soup.find(id="resultStats")
print(phrase_extract)
You also probably just want the text of that element, not the element itself, so you can do something like
phrase_text = phrase_extract.text
or to get the actual value as an integer:
val = int(phrase_extract.text.split(' ')[1].replace(',',''))
Answered By - wpercy
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.