Issue
i want to scrape emails on search resulted query. but when i access to class with css selecter "select" and print it always shows empty list. How can i access .r class or "class=g"?
import requests
from bs4 import BeautifulSoup
url = "https://www.google.com/search?sxsrf=ACYBGNQA4leQETe0psVZPu7daLWbdsc9Ow%3A1579194494737&ei=fpggXpvRLMakwQKkqpSICg&q=%22computer+science+%22%22usa%22+%22%40yahoo.com%22&oq=%22computer+science+%22%22usa%22+%22%40yahoo.com%22&gs_l=psy-ab.12...0.0..7407...0.0..0.0.0.......0......gws-wiz.82okhpdJLYg&ved=0ahUKEwibiI_3zYjnAhVGUlAKHSQVBaEQ4dUDCAs"
responce = requests.get(url)
soup = BeautifulSoup(responce.text, "html.parser")
test = soup.select('.r')
print(test)
Solution
Your program is correct, but to get correct answer from Google, you need to specify User-Agent
header:
import requests from bs4 import BeautifulSoup
url = "https://www.google.com/search?sxsrf=ACYBGNQA4leQETe0psVZPu7daLWbdsc9Ow%3A1579194494737&ei=fpggXpvRLMakwQKkqpSICg&q=%22computer+science+%22%22usa%22+%22%40yahoo.com%22&oq=%22computer+science+%22%22usa%22+%22%40yahoo.com%22&gs_l=psy-ab.12...0.0..7407...0.0..0.0.0.......0......gws-wiz.82okhpdJLYg&ved=0ahUKEwibiI_3zYjnAhVGUlAKHSQVBaEQ4dUDCAs"
headers = {'User-Agent':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0'}
responce = requests.get(url, headers=headers) # <-- specify custom header
soup = BeautifulSoup(responce.text, "html.parser")
test = soup.select('.r')
print(test)
Prints:
[<div class="r"><a href="https://www.yahoo.com/news/11-course-complete-computer-science-171322233.html" onmousedown="return rwt(this,'','','','1','AOvVaw2wM4TUxc_4V7s9GjeWTNAG','','2ahUKEwjt17Kk-YjnAhW2R0EAHcnsC3QQFjAAegQIAxAB','','',event)"><div class="TbwUpd"><img alt="https://...
...
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.