Issue
I am trying to scrape a cinema site showtimes. When I observe the POST requests the site is using to retrieve the showtimes (https://www.majorcineplex.com/booking2/search_showtime/cinema=1), it is working correctly.
However when I use the POST request (https://www.majorcineplex.com/ajaxbooking/ajax_showtime) directly in the browser. It is showing me "There is no information for this show".
I find this weird as both were fired from the same Chrome browser but I am getting different results.
I offer my appreciations in advance for any help/advice provided.
Update 29-May-2019
Here is my code for the Scrapy spider.
Basically from the response, I am trying to retrieve a div element with the class=book_st_contain.
I am sure this div element is in the HTML as I have checked using the Chrome Dev Tools. However it is just not there when I run the spider.
class SessionSpider(scrapy.Spider):
name = 'session'
start_urls = [
'https://www.majorcineplex.com/booking2/search_showtime/cinema=1'
]
def parse(self, response):
f = open('response.txt', 'w')
f.write(response.text)
Solution
You need to ensure that the headers and posted body matches the one you are seeing in your browsers devtools:
A scrapy spider to replicate this would look something like this:
class MySpider(spider):
name = 'major'
showtime_url = "https://www.majorcineplex.com/ajaxbooking/ajax_showtime"
showtime_headers = {
'Accept': "*/*",
'X-Requested-With': "XMLHttpRequest",
'Content-Type': "application/x-www-form-urlencoded; charset=UTF-8",
}
showtime_payload = "movie_text=&cinema_text={}".format
def start_requests():
# crawl cinemas with ids 1 to 10
for cinema in range(1, 10):
payload = self.showtime_payload(cinema)
yield Request(
self.showtime_url,
headers=self.showtime_headers,
body=payload,
method='POST'
)
Primarily you have to ensure Content-Type
and X-Requested-With
headers are present and match the values you see in your inspector.
Answered By - Granitosaurus
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.