Issue
I wrote this code for now to webscrape news from a spacific topic from cnn:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
serch_term = input('What News are you looking for today? ')
service = Service(executable_path='chromedriver.exe')
driver = webdriver.Chrome(service=service)
driver.get(f'https://edition.cnn.com/search?q={serch_term}')
soup = BeautifulSoup(driver.page_source,'html.parser' )
soup.select('h3.cnn-search__result-headline')
But its not working im getting this error after chrome pops up with the cnn site
DevTools listening on ws://127.0.0.1:65095/devtools/browser/05c3af16-cb5a-423c-af0b-c6cc96af980d
[11496:15920:0314/183947.010:ERROR:ssl_client_socket_impl.cc(995)] handshake failed; returned -1, SSL error code 1, net_error -200
PS C:\Users\user\Desktop\Informatik\Praktik\Projekte\Python\stiil_working_on\news_automation> [3408:22012:0314/183950.356:ERROR:device_event_log_impl.cc(214)] [18:39:50.360] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: Ein an das System angeschlossenes Ger�t funktioniert nicht. (0x1F)
[3408:22012:0314/183950.356:ERROR:device_event_log_impl.cc(214)] [18:39:50.362] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: Ein an das System angeschlossenes Ger�t funktioniert nicht. (0x1F)
[11496:15920:0314/183953.096:ERROR:ssl_client_socket_impl.cc(995)] handshake failed; returned -1, SSL error code 1, net_error -200
[15208:11512:0314/184146.206:ERROR:gpu_init.cc(440)] Passthrough is not supported, GL is disabled, ANGLE is
Solution
input fuction can't find search result and it raises error but general search is working. Please Just run the code.
from bs4 import BeautifulSoup
import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
serch_term = 'News'
url = f'https://edition.cnn.com/search?q={serch_term}'
print(url)
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
driver.get(url)
time.sleep(4)
soup = BeautifulSoup(driver.page_source, 'html.parser')
#driver.close()
for h3 in soup.select('h3.cnn-search__result-headline > a'):
title=h3.text
url=h3.get('href')
abs_url='https:'+ url
print(abs_url)
Output:
https://www.cnn.com/europe/live-news/ukraine-russia-putin-news-03-14-22/index.html
https://www.cnn.com/2022/03/14/energy/india-russia-oil/index.html
https://www.cnn.com/2022/03/14/us/new-york-city-washington-dc-homeless-shootings/index.html
https://www.cnn.com/2022/03/14/politics/breonna-taylor-mother-federal-charges-officers/index.html
https://www.cnn.com/2022/03/14/politics/biden-possible-european-trip/index.html
https://www.cnn.com/2022/03/07/world/what-we-know-brittney-griner-arrest-russia/index.html
https://www.cnn.com/2022/03/14/middleeast/mideast-summary-03-14-2022-intl/index.html
https://www.cnn.com/2022/03/14/energy/oil-prices/index.html
https://www.cnn.com/2022/03/14/tech/pete-davidson-blue-origin-launch-scn/index.html
https://www.cnn.com/2022/03/14/politics/donald-trump-south-carolina-speech/index.html
Answered By - F.Hoque
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.