Issue
I'm new to python and i'm learning from automating boring stuff with python
, so currently i'm in the webscraping chapter in the book. SO, i want to just scrape the titles of the results of search results.
Here is my code -
import requests
from bs4 import BeautifulSoup
import webbrowser
term = 'python'
req = requests.get('https://www.google.com/search?q=' + term)
req.raise_for_status()
soup = BeautifulSoup(req.text, 'lxml')
title = soup.find('div', class_ = 'r')
print(title)
The problem is this is always returning None
. I even attacked the inspect element tool screen shot so that you can see the div
and class
name I'm using.
Any help is appreciated Thanks
Solution
To get correct response from the server, specify User-Agent
HTTP header:
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0'}
term = 'python'
req = requests.get('https://www.google.com/search?q=' + term, headers=headers)
req.raise_for_status()
soup = BeautifulSoup(req.content, 'lxml')
title = soup.find('div', class_ = 'r')
print(title.get_text(strip=True, separator=' '))
Prints:
Welcome to Python.org www.python.org www.python.org ...
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.