Friday, March 4, 2022

[FIXED] POST request URL not working when used directly

March 04, 2022 scrapy, web-scraping No comments

Issue

I am trying to scrape a cinema site showtimes. When I observe the POST requests the site is using to retrieve the showtimes (https://www.majorcineplex.com/booking2/search_showtime/cinema=1), it is working correctly.

However when I use the POST request (https://www.majorcineplex.com/ajaxbooking/ajax_showtime) directly in the browser. It is showing me "There is no information for this show".

I find this weird as both were fired from the same Chrome browser but I am getting different results.

I offer my appreciations in advance for any help/advice provided.

Update 29-May-2019

Here is my code for the Scrapy spider.

Basically from the response, I am trying to retrieve a div element with the class=book_st_contain.

I am sure this div element is in the HTML as I have checked using the Chrome Dev Tools. However it is just not there when I run the spider.

class SessionSpider(scrapy.Spider):
    name = 'session'
    start_urls = [
      'https://www.majorcineplex.com/booking2/search_showtime/cinema=1'
    ]

    def parse(self, response):
        f = open('response.txt', 'w')
        f.write(response.text)

Solution

You need to ensure that the headers and posted body matches the one you are seeing in your browsers devtools:

A scrapy spider to replicate this would look something like this:

class MySpider(spider):
    name = 'major'

    showtime_url = "https://www.majorcineplex.com/ajaxbooking/ajax_showtime"
    showtime_headers = {
        'Accept': "*/*",
        'X-Requested-With': "XMLHttpRequest",
        'Content-Type': "application/x-www-form-urlencoded; charset=UTF-8",
    }
    showtime_payload = "movie_text=&cinema_text={}".format

    def start_requests():
        # crawl cinemas with ids 1 to 10
        for cinema in range(1, 10):
            payload = self.showtime_payload(cinema)
            yield Request(
                self.showtime_url,
                headers=self.showtime_headers,
                body=payload,
                method='POST'
            )

Primarily you have to ensure Content-Type and X-Requested-With headers are present and match the values you see in your inspector.

Answered By - Granitosaurus

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, March 4, 2022

[FIXED] POST request URL not working when used directly

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels