Issue
I've been trying to use BeautifulSoup to find the text of each search result on google. Using the developer tools, I can see that this is represented by a <h3>
with the class
" LC20lb DKV0Md ".
However I cant seem find this using BeautifulSoup. What am I doing wrong?
import requests
from bs4 import BeautifulSoup
res = requests.get('http://google.com/search?q=world+news')
soup = BeautifulSoup(res.content, 'html.parser')
soup.find_all('h3', class_= 'LC201b DKV0Md')
Solution
You do not have to search by class
, you simply can select
all <h3>
that includes a <div>
and than get_text()
of each:
import requests
from bs4 import BeautifulSoup
res = requests.get('http://google.com/search?q=world+news')
soup = BeautifulSoup(res.content, 'html.parser')
[x.get_text() for x in soup.select('h3 div')]
Output:
['World - BBC News',
'BBC News World',
'Latest news from around the world | The Guardian',
'World - breaking news, videos and headlines - CNN',
'CNN International - Breaking News, US News, World News and Video',
'Welt-Nachrichten',
'BBC World News (Fernsehsender)',
'World News - Breaking international news and headlines | Sky News',
'International News | Latest World News, Videos & Photos -ABC',
'World News Headlines | Reuters',
'World News - Hindustan Times',
'World News | International Headlines - Breaking World - Global News']
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.