Issue
I am trying to scrape data about course IDs and times of classes from a college course-schedule website (https://internet3.trincoll.edu/ptools/courselisting.aspx). The website has dropdown menu making you select your major and shows you classes corresponding to the major you select. The dropdown menu content uses the and tag.
How do I scrape class data of other majors when they all are in the same URL (the only change is in the selected="selected" on the major)? dropdown menu and the inspect element
I tried changing the tags so that another major has the selected = "selected"
tag but that did not work. Here is the code-
from bs4 import BeautifulSoup
import requests
html_text = requests.get('https://internet3.trincoll.edu/ptools/courselisting.aspx').text
soup = BeautifulSoup(html_text,'lxml')
not_selected = soup.find('option', selected = 'selected')
not_selected.extract()
selected = soup.find('option', value = 'CPSC')
selected["selected"] = 'selected'
print(soup)
While the output has CPSC as selected = "selected
in the option tag (and the AMST option tag does not appear), the content of the webpage still shows the content of the default selected option (i.e. AMST).
Solution
To get the other pages you need to use their Ajax API:
import requests
from bs4 import BeautifulSoup
url = "https://internet3.trincoll.edu/ptools/courselisting.aspx"
with requests.session() as s:
soup = BeautifulSoup(s.get(url).content, "html.parser")
courses = [v["value"] for v in soup.select("#ddlList [value]")]
for c in courses:
data = {}
for inp in soup.select("input[value][name]"):
data[inp["name"]] = inp["value"]
data["ddlList"] = c
data["ddlLevelList"] = "0"
data["ddlTermList"] = "1241" # <-- Fall 2023
data["ddlSession"] = "0"
data["__EVENTTARGET"] = "ddlList"
data["__EVENTARGUMENT"] = ""
data["__LASTFOCUS"] = ""
soup = BeautifulSoup(s.post(url, data=data).content, "html.parser")
print(c)
print("-" * 80)
for id_, title in zip(soup.select(".TITLE_id"), soup.select(".TITLE_title")):
print(f"{id_.text:<10} {title.text}")
print()
Prints:
AMST
--------------------------------------------------------------------------------
3170 African-American History
3191 Anthropology of Museums
3252 Born in Blood
3182 Early American Women's Lit
3172 Abolition: A Global History
3174 Sports and American Society
2355 Race and Urban Space
1416 Independent Study
3175 Race, Gender, Global Security
3080 Black Women Writers
1449 Teaching Assistantship
1542 Research Assistantship
3359 Senior Thesis Part 1
2745 Approaches to American Studies
3177 Race, Gender, Global Security
3079 Black Women Writers
1565 Museums and Communities Intern
1441 Independent Study
1437 Research Project
1438 Thesis Part I
1440 Thesis Part II
1439 Thesis
3028 Urban Politics
ANTH
--------------------------------------------------------------------------------
1701 Intro to Cultural Anthropology
1797 Intro to Cultural Anthropology
2398 Intro to Cultural Anthropology
2360 Intro to Political Ecology
...and so on.
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.