Issue
So far I have exported the link to my notebook are parsed the phrase using beautiful soup:
html_data = requests.get('https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue')
soup = BeautifulSoup(html_data.text, 'lxml')
Then I tried to basically make a table that's only containing revenue (Telsa Quarterly Revenue) here (trying to omit Nan values):
tesla_revenue = pd.DataFrame(columns=["Date", "Revenue"])
table = soup.find('table', attrs={'class': 'historical_data_table table'})
for result in table:
if table.find('th').getText().startswith("Tesla Quarterly Revenue"):
for row in result.find_all('tbody').find_all("tr"):
col = row.find("td")
if len(col) != 2: continue
Date = col[0].text
Revenue = col[1].text
tesla_revenue = tesla_revenue.append({"Date":Date, "Revenue":Revenue}, ignore_index=True)
tesla_revenue = tesla_revenue.apply (pd.to_numeric, errors='coerce')
tesla_revenue = tesla_revenue.dropna()
Then when I tried to print out the tail of the table, I just get this:
| Date | Revenue |
(only the headers)
I think I might done something wrong when I made my table, but I can't be sure. Any help would be appreciated.
Solution
There are few mistakes in this code but main problem is there are 4 tables in HTML but you use find('table', ...)
instead of find_all('table',...)
so you get only first table but Revenue
is in other table (probably in second table).
import requests
from bs4 import BeautifulSoup
import pandas as pd
response = requests.get('https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue')
soup = BeautifulSoup(response.text, 'lxml')
all_tables = soup.find_all('table', attrs={'class': 'historical_data_table table'})
tesla_revenue = pd.DataFrame(columns=["Date", "Revenue"])
for table in all_tables:
if table.find('th').getText().startswith("Tesla Quarterly Revenue"):
for row in table.find_all("tr"):
col = row.find_all("td")
if len(col) == 2:
date = col[0].text
revenue = col[1].text.replace('$', '').replace(',', '')
tesla_revenue = tesla_revenue.append({"Date": date, "Revenue": revenue}, ignore_index=True)
#tesla_revenue = tesla_revenue.apply(pd.to_numeric, errors='coerce')
#tesla_revenue = tesla_revenue.dropna()
print(tesla_revenue)
Answered By - furas
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.