Issue
I am trying to scrape the data but I am unable to identify the correct 'div' as there are two of them with same class. If i try to do a find on the parent of second 'div' and then call its children, it simply gives none.
The data to be scraped is the admission status, school name, GRE, GMAT scores.
I am doing this with the help of Python and beautifulsoup
Here is my code below
import requests
from bs4 import BeautifulSoup
url = 'https://www.clearadmit.com/livewire/'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
container = soup.find('div', attrs = {'class' : 'livewire-container'})
print(container)
Solution
The posts are loaded via Ajax from external source. You can use following example how to load them:
import requests
from bs4 import BeautifulSoup
url = "https://www.clearadmit.com/wp-admin/admin-ajax.php"
params = {
"action": "livewire_load_posts",
"school": "",
"round": "",
"status": "",
"orderby": "",
"paged": "",
}
for page in range(1, 5): # <--- increase number of pages here
print("Getting page {}..".format(page))
params["paged"] = page
data = requests.post(url, data=params).json()
soup = BeautifulSoup(data["markup"], "html.parser")
for entry in soup.select(".livewire-entry"):
status = entry.select_one(".status")
name = status.find_next("strong")
details = entry.select_one(".lw-details")
print(
"{:<25} {:<30} {}".format(
status.get_text(strip=True),
name.get_text(strip=True),
details.get_text(strip=True),
)
)
print("-" * 80)
Prints:
Getting page 1..
News All Schools
Accepted from Waitlist Michigan / Ross Round: Round 2
Accepted from Waitlist UT Austin / McCombs GMAT: 640 Round: Round 2
Accepted Johns Hopkins / Carey Round: Round 3
Accepted Michigan / Ross GPA: 3.65 GRE: 322 Round: Round 3
Accepted from Waitlist Michigan / Ross GPA: 3.1 Round: Round 2 | Michigan
Note All Schools GMAT: 740 Round: Round 1 | Africa
Accepted INSEAD GPA: 3.5 GMAT: 770 Round: Round 4 | Taiwan
Accepted INSEAD GMAT: 750 Round: Round 4 | India
Interview Invite Berkeley / Haas GPA: 3.72 GMAT: 740 Round: Round 1 | IL
Enrolled Duke / Fuqua GPA: 3.59 GRE: 333 Round: Round 2 | Miami
Interview Invite Georgetown / McDonough GRE: 307 Round: Round 3 | Arlington
Accepted USC / Marshall GPA: 3.4 GMAT: 720 Round: Round 2 | NY
Note NYU Stern Round: Round 1
Enrolled Berkeley / Haas GMAT: 760 Round: Round 2 | Canada
Waitlisted Duke / Fuqua GRE: 314 Round: Round 2
Note All Schools Round: Rolling Admissions
Interview Invite Columbia GPA: 3.5 Round: Round 3 | NY
Accepted UNC Kenan-Flagler GMAT: 740 Round: Round 1
Interview Invite MIT Sloan GPA: 3.6 GMAT: 740 Round: Round 3
--------------------------------------------------------------------------------
Getting page 2..
Rejected Columbia GPA: 3.6 GRE: 331 Round: Round 3 | IL
Interview Invite Northwestern / Kellogg GPA: 3.6 GRE: 331 Round: Round 3 | IL
Waitlisted Duke / Fuqua GMAT: 760 Round: Round 2 | Canada
...
EDIT: Added pagination.
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.