Saturday, September 24, 2022

[FIXED] Generating URL for Yahoo news and Bing news with Python and BeautifulSoup

September 24, 2022 beautifulsoup, python, web-scraping No comments

Issue

I want to scrape data from Yahoo News and 'Bing News' pages. The data that I want to scrape are headlines or/and text below headlines (what ever It can be scraped) and dates (time) when its posted.

I have wrote a code but It does not return anything. Its the problem with my url since Im getting response 404

Can you please help me with it?

This is the code for 'Bing'

from bs4 import BeautifulSoup
import requests

term = 'usa'
url = 'http://www.bing.com/news/q?s={}'.format(term)

response = requests.get(url)
print(response)

soup = BeautifulSoup(response.text, 'html.parser')
print(soup)

And this is for Yahoo:

term = 'usa'

url = 'http://news.search.yahoo.com/q?s={}'.format(term)

response = requests.get(url)
print(response)

soup = BeautifulSoup(response.text, 'html.parser')
print(soup)

Please help me to generate these urls, whats the logic behind them, Im still a noob :)

Solution

Basically your urls are just wrong. The urls that you have to use are the same ones that you find in the address bar while using a regular browser. Usually most search engines and aggregators use q parameter for the search term. Most of the other parameters are usually not required (sometimes they are - eg. for specifying result page no etc..).

Bing

from bs4 import BeautifulSoup
import requests
import re
term = 'usa'
url = 'https://www.bing.com/news/search?q={}'.format(term)
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
for news_card in soup.find_all('div', class_="news-card-body"):
    title = news_card.find('a', class_="title").text
    time = news_card.find(
        'span',
        attrs={'aria-label': re.compile(".*ago$")}
    ).text
    print("{} ({})".format(title, time))

Output

Jason Mohammed blitzkrieg sinks USA (17h)
USA Swimming held not liable by California jury in sexual abuse case (1d)
United States 4-1 Canada: USA secure payback in Nations League (1d)
USA always plays the Dalai Lama card in dealing with China, says Chinese Professor (1d)
...

Yahoo

from bs4 import BeautifulSoup
import requests
term = 'usa'
url = 'https://news.search.yahoo.com/search?q={}'.format(term)
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
for news_item in soup.find_all('div', class_='NewsArticle'):
    title = news_item.find('h4').text
    time = news_item.find('span', class_='fc-2nd').text
    # Clean time text
    time = time.replace('·', '').strip()
    print("{} ({})".format(title, time))

Output

USA Baseball will return to Arizona for second Olympic qualifying chance (52 minutes ago)
Prized White Sox prospect Andrew Vaughn wraps up stint with USA Baseball (28 minutes ago)
Mexico defeats USA in extras for Olympic berth (13 hours ago)
...

Answered By - Bitto Bennichan

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, September 24, 2022

[FIXED] Generating URL for Yahoo news and Bing news with Python and BeautifulSoup

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels