Issue
I am trying to get all links, titles & dates in a specific month, like March on the website, I'm using BeautifulSoup
to do so:
from bs4 import BeautifulSoup
import requests
html_link='https://www.pds.com.ph/index.html%3Fpage_id=3261.html'
html = requests.get(html_link).text
soup = BeautifulSoup(html, 'html.parser')
for link in soup.find_all('td'):
#Text contains 'March'
#Get the link title & link &date
I'm new to BeautifulSoup
, in Selenium
I used the xpath: "//td[contains(text(),'Mar')"
, how can I do this with BeautifulSoup
?
Solution
Here is a solution you can try out,
import re
import requests
from bs4 import BeautifulSoup
html_link = 'https://www.pds.com.ph/index.html%3Fpage_id=3261.html'
html = requests.get(html_link).text
soup = BeautifulSoup(html, 'html.parser')
search = re.compile("March")
for td in soup.find_all('td', text=search):
link = td.parent.select_one("td > a")
if link:
print(f"Title : {link.text}")
print(f"Link : {link['href']}")
print(f"Date : {td.text}")
print("-" * 30)
Title : RCBC Lists PHP 17.87257 Billion ASEAN Sustainability Bonds on PDEx
Link : index.html%3Fp=87239.html
Date : March 31, 2021
------------------------------
Title : Aboitiz Power Corporation Raises 8 Billion Fixed Rate Bonds on PDEx
Link : index.html%3Fp=86743.html
Date : March 16, 2021
------------------------------
....
Answered By - sushanth
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.