Issue
I want to scrape the table from this website "" and as it keeps getting updated hourly want to track changes as well. I tried scraping data using selenium but it was all in one column without any table. How to use pandas and Beautiful Soup to scrape the table in a structured format and track changes as well. This is the code I'm trying to figure out.
import pandas as pd
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
table = soup.find('table', attrs={'id':'subs noBorders evenRows'})
table_rows = table.find_all('tr')
res = []
for tr in table_rows:
td = tr.find_all('td')
row = [tr.text.strip() for tr in td if tr.text.strip()]
if row:
res.append(row)
df = pd.DataFrame(res, columns=["Notice No","Subject","Segment Name","Category Name","Department","PDF"])
print(df)
It would be a help if you can help me getting data and how to keep track of new data whenever I run the script again.
Solution
Be informed that you don't need to include params
as the desired information presented within main page. I've left it for you in case if you will scrape different id
.
Also be informed that i skipped PDF
as it's will shown NAN
values since the pdf
links is not an hyperlink
. it's jsut a logo
icon which stored within the server. but once you click on the pdf
logo, then it's make a post request to the target to download the file. Based on that without a clear information from you so here's an answer for your requirements.
import requests
import pandas as pd
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0"
}
params = {
'id': 0,
'txtscripcd': '',
'pagecont': '',
'subject': ''
}
def main(url):
r = requests.get(url, params=params, headers=headers)
df = pd.read_html(r.content)[-1].iloc[:, :-1]
print(df)
main("https://www.bseindia.com/markets/MarketInfo/NoticesCirculars.aspx")
Output:
Notice No Subject Segment Name Category Name Department
0 20200923-2 Offer to Buy – Acquisition Window (Delisting) ... Equity Trading Trading Operations
1 20200923-1 Change in Name of the Company. Debt Company related Listing Operations
Answered By - αԋɱҽԃ αмєяιcαη
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.