Issue
I'm trying to extract data from a stock exchange website using BeautifulSoup but I'm getting only the column names as a result. Here is the code
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.saudiexchange.sa/wps/portal/saudiexchange/newsandreports/reports-publications/historical-reports/!ut/p/z1/04_Sj9CPykssy0xPLMnMz0vMAfIjo8ziTR3NDIw8LAz8DTxCnA3MDILdzUJDLAyNXE30I4EKzHEqMDTTDyekoCA7zRMAIkY09Q!!/'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
table1 = soup.find('div', id='tab1Content')
headers = []
for i in table1.find_all('th'):
title = i.text
headers.append(title)
mydata = pd.DataFrame(columns = headers)
for j in table1.find_all('tr')[1:]:
row_data = j.find_all('td')
row = [i.text for i in row_data]
length = len(mydata)
mydata.loc[length] = row
mydata
Can anyone please help me find a solution to this problem. Thanks in advance.
Solution
The data is present on a page inside a script tag not in the HTML as you're trying in the code, and the data from script tag is rendered to the HTML as the page is loaded.
I updated the code as follows:
from bs4 import BeautifulSoup
import json
import pandas
import re
import requests
url = 'https://www.saudiexchange.sa/wps/portal/saudiexchange/newsandreports/reports-publications/historical-reports/!ut/p/z1/04_Sj9CPykssy0xPLMnMz0vMAfIjo8ziTR3NDIw8LAz8DTxCnA3MDILdzUJDLAyNXE30I4EKzHEqMDTTDyekoCA7zRMAIkY09Q!!/'
page = BeautifulSoup(requests.get(url).text, 'lxml')
records = re.findall('JSON\.stringify.*(\[.*\])', page.find_all('script', {'type': 'text/javascript'})[-3].text)
if records:
records = json.loads(records[0])
df = pandas.DataFrame(records)
print(df)
Upon loading the page into BeautifulSoup
, I am fetching the exact tag, then used a regular expression to find the line which has the entire data as JSON, loaded JSON into a dictionary, and then loaded it to a DataFrame.
Output:
transactionDate volumeTraded turnOver todaysOpen highPrice lowPrice previousClosePrice noOfTrades change changePercent lastTradePrice transactionDateStr lastYield
0 2023/07/27 234,061,315 6,479,159,448.07 11,894.32 11,933.68 11,847.72 11,847.72 395,910 None None None 2023/07/27 None
1 2023/07/26 250,527,031 6,587,353,138.03 11,892.58 11,917.29 11,848.73 11,906.13 401,644 None None None 2023/07/26 None
2 2023/07/25 242,896,126 6,747,776,834.91 11,842.11 11,896.87 11,834.82 11,882.68 404,113 None None None 2023/07/25 None
3 2023/07/24 272,729,851 6,299,890,410.07 11,761.42 11,819.51 11,740.82 11,801.90 401,279 None None None 2023/07/24 None
4 2023/07/23 167,268,795 4,259,359,309.53 11,757.07 11,773.25 11,715.77 11,760.30 320,118 None None None 2023/07/23 None
5 2023/07/20 245,817,586 5,825,981,456.79 11,767.83 11,775.56 11,723.65 11,755.94 387,269 None None None 2023/07/20 None
6 2023/07/19 254,172,064 5,562,000,894.80 11,768.50 11,786.26 11,733.82 11,752.63 386,933 None None None 2023/07/19 None
7 2023/07/18 326,862,994 8,108,296,076.36 11,794.96 11,826.34 11,726.76 11,768.71 512,443 None None None 2023/07/18 None
8 2023/07/17 274,100,305 8,110,774,890.18 11,713.39 11,784.99 11,671.15 11,780.27 503,442 None None None 2023/07/17 None
9 2023/07/16 275,812,887 6,752,825,160.20 11,733.20 11,754.37 11,694.07 11,715.50 430,684 None None None 2023/07/16 None
10 2023/07/13 410,661,272 7,696,321,347.04 11,740.65 11,750.37 11,695.62 11,707.87 465,247 None None None 2023/07/13 None
11 2023/07/12 444,684,118 7,803,773,389.44 11,685.53 11,758.79 11,678.72 11,727.19 463,712 None None None 2023/07/12 None
12 2023/07/11 354,779,942 7,756,051,185.86 11,604.16 11,673.27 11,603.25 11,664.50 494,277 None None None 2023/07/11 None
13 2023/07/10 472,639,436 6,881,526,719.26 11,594.38 11,624.22 11,565.82 11,586.93 478,975 None None None 2023/07/10 None
14 2023/07/09 317,880,581 6,035,872,429.64 11,590.76 11,618.03 11,557.67 11,609.35 407,213 None None None 2023/07/09 None
15 2023/07/06 313,606,717 6,576,334,138.08 11,596.00 11,598.61 11,553.97 11,597.91 445,013 None None None 2023/07/06 None
16 2023/07/05 304,527,392 8,489,250,022.12 11,615.91 11,637.41 11,568.00 11,591.55 528,784 None None None 2023/07/05 None
17 2023/07/04 433,117,492 7,255,962,416.93 11,557.80 11,618.72 11,552.60 11,618.72 461,782 None None None 2023/07/04 None
18 2023/07/03 478,324,863 7,123,786,536.34 11,508.35 11,545.13 11,463.83 11,545.13 510,590 None None None 2023/07/03 None
19 2023/07/02 336,148,365 4,566,009,523.79 11,491.76 11,522.87 11,481.73 11,493.91 336,150 None None None 2023/07/02 None
Answered By - Zero
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.