Wednesday, April 6, 2022

[FIXED] Error when webscraping news from cnn using selenium and bs4 to get links and titles from articles

April 06, 2022 beautifulsoup, python, python-3.x, selenium, web-scraping No comments

Issue

I wrote this code for now to webscrape news from a spacific topic from cnn:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service

serch_term = input('What News are you looking for today? ')

service = Service(executable_path='chromedriver.exe')
driver = webdriver.Chrome(service=service)
driver.get(f'https://edition.cnn.com/search?q={serch_term}')

soup = BeautifulSoup(driver.page_source,'html.parser' )
soup.select('h3.cnn-search__result-headline')

But its not working im getting this error after chrome pops up with the cnn site

DevTools listening on ws://127.0.0.1:65095/devtools/browser/05c3af16-cb5a-423c-af0b-c6cc96af980d
[11496:15920:0314/183947.010:ERROR:ssl_client_socket_impl.cc(995)] handshake failed; returned -1, SSL error code 1, net_error -200
PS C:\Users\user\Desktop\Informatik\Praktik\Projekte\Python\stiil_working_on\news_automation> [3408:22012:0314/183950.356:ERROR:device_event_log_impl.cc(214)] [18:39:50.360] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: Ein an das System angeschlossenes Ger�t funktioniert nicht. (0x1F)
[3408:22012:0314/183950.356:ERROR:device_event_log_impl.cc(214)] [18:39:50.362] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: Ein an das System angeschlossenes Ger�t funktioniert nicht. (0x1F)
[11496:15920:0314/183953.096:ERROR:ssl_client_socket_impl.cc(995)] handshake failed; returned -1, SSL error code 1, net_error -200
[15208:11512:0314/184146.206:ERROR:gpu_init.cc(440)] Passthrough is not supported, GL is disabled, ANGLE is

Solution

input fuction can't find search result and it raises error but general search is working. Please Just run the code.

from bs4 import BeautifulSoup
import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

serch_term = 'News'

url = f'https://edition.cnn.com/search?q={serch_term}'
print(url)

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()

driver.get(url)
time.sleep(4)

soup = BeautifulSoup(driver.page_source, 'html.parser')
#driver.close()

for h3 in soup.select('h3.cnn-search__result-headline > a'):
    title=h3.text
    url=h3.get('href')
    abs_url='https:'+ url
    print(abs_url)

Output:

https://www.cnn.com/europe/live-news/ukraine-russia-putin-news-03-14-22/index.html
https://www.cnn.com/2022/03/14/energy/india-russia-oil/index.html
https://www.cnn.com/2022/03/14/us/new-york-city-washington-dc-homeless-shootings/index.html
https://www.cnn.com/2022/03/14/politics/breonna-taylor-mother-federal-charges-officers/index.html
https://www.cnn.com/2022/03/14/politics/biden-possible-european-trip/index.html
https://www.cnn.com/2022/03/07/world/what-we-know-brittney-griner-arrest-russia/index.html
https://www.cnn.com/2022/03/14/middleeast/mideast-summary-03-14-2022-intl/index.html
https://www.cnn.com/2022/03/14/energy/oil-prices/index.html
https://www.cnn.com/2022/03/14/tech/pete-davidson-blue-origin-launch-scn/index.html
https://www.cnn.com/2022/03/14/politics/donald-trump-south-carolina-speech/index.html

Answered By - F.Hoque

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, April 6, 2022

[FIXED] Error when webscraping news from cnn using selenium and bs4 to get links and titles from articles

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels