Issue
I create a dataframe using beautiful soup scraping the data. However, there have 2 problems.
- Why does the for loop run 2 times?
- How to remove the brackets on the data frame?
import urllib.request as req
from bs4 import BeautifulSoup
import bs4
import requests
import pandas as pd
url = "https://finance.yahoo.com/quote/BF-B/profile?p=BF-B"
root = requests.get(url)
soup = BeautifulSoup(root.text, 'html.parser')
records = []
for result in soup:
name = soup.find_all('h1', attrs={'D(ib) Fz(18px)'})
website = soup.find_all('a')[44]
sector = soup.find_all('span')[35]
industry = soup.find_all('span')[37]
records.append((name, website, sector, industry))
df = pd.DataFrame(records, columns=['name', 'website', 'sector', 'industry'])
df.head()
And the result like this:
Solution
To get information about the company, you don't have to loop over the soup
, just extract necessary information directly. To get rid of [..]
brackets, use .text
property:
import requests
from bs4 import BeautifulSoup
url = 'https://finance.yahoo.com/quote/BF-B/profile?p=BF-B'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
all_data = []
all_data.append({
'Name': soup.h1.text,
'Website': soup.select_one('.asset-profile-container a[href^="http"]')['href'],
'Sector': soup.select_one('span:contains("Sector(s)") + span').text,
'Industry': soup.select_one('span:contains("Industry") + span').text
})
df = pd.DataFrame(all_data)
print(df)
Prints:
Name Website Sector Industry
0 Brown-Forman Corporation (BF-B) http://www.brown-forman.com Consumer Defensive Beverages—Wineries & Distilleries
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.