Issue
I am trying to extract some data from this page, but the parse function is not executed, I try to put another URL like the one of google.com and is executed but with the page that I need it does not
import scrapy
from datetime import date
from osp_scraper.spiders.CustomSpider import CustomSpider
class PrincetonSpider(CustomSpider):
name = "princeton"
# year = date.today().year
# month = date.today().month
# day = date.today().day
# start_urls = [f'https://blackboard.princeton.edu/webapps/blackboard/execute/viewCatalog?type=Course&command=NewSearch&searchField=CourseId&searchOperator=Contains&searchText=_&dateSearchOperator=LessThan&dateSearchDate_datetime={year}-{month}-{day}+9%3A33%3A00']
start_urls = ['https://blackboard.princeton.edu/webapps/blackboard/execute/viewCatalog?type=Course']
def parse(self, response):
print('--------------------------------')
courses = response.xpath('//*[@id="listContainer_databody"]/tr')
for course in courses:
print(course.xpath('td[1]/span[2]/text()').get())
input()
yield response.follow(
url=course.xpath('th/a/@href').get(),
callback=self.search_syllabus
)
Solution
custom_settings = {
**CustomSpider.custom_settings,
'ROBOTSTXT_OBEY': False,
}
I put that and now works
Answered By - Juan Ignacio Ostrit
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.