Issue
I've been trying to extract html data with BeautifulSoup, but I can't seem to properly take the 'class' tag from the html and into my code. Here's what I tried:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
url = "https://allauthor.com/quotes/"
uClient = uReq(url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll("div", {"class": "quote-list"})
print(containers)
Here I'm expecting to get the list of quotes from the website, but for some reason I get nothing when I run the code. How do I get the list of quotes, or at least one quote in plain text on my console?
Solution
The data you see is loaded from external URL via JavaScript. To load it manually you can use next example:
import requests
from bs4 import BeautifulSoup
url = "https://allauthor.com/getQuotesDirectory.php"
data = {
"draw": "1",
"columns[0][data]": "0",
"columns[0][name]": "",
"columns[0][searchable]": "true",
"columns[0][orderable]": "false",
"columns[0][search][value]": "",
"columns[0][search][regex]": "false",
"start": "0",
"length": "50",
"search[value]": "",
"search[regex]": "false",
"orderby": "usersView desc",
"status": "Y",
"category": "",
"author": "",
}
data = requests.post(url, data=data).json()
for i, row in enumerate(data["aaData"], 1):
soup = BeautifulSoup(row[0], "html.parser")
print("{:<3} {}".format(i, " ".join(soup.div.text.split())))
Prints:
1 May God shower his choicest blessings on you. wishing you happiness, good health and a great year ahead.Birthday 9,184
2 A mind all logic is like a knife all blade. It makes the hand bleed that uses it. Rabindranath TagoreLogic 6,480
3 Reality of life When you give importance to people they think that you are always free But They dont understand that you make yourself available for them every time.New Collection 6,164
4 Xcuse me, I found something under my shoes. Oh its your attitude.Attitude 6,024
...and so on.
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.