Sunday, January 2, 2022

[FIXED] Scrapy FormRequest returning 400 error while Python Requests works

January 02, 2022 python, python-requests, scrapy, web-scraping No comments

Issue

Sending a Post request through Scrapy FormRequest results in a 400 error while the same request made through Python Requests is successful.

The request headers and params can't be the problem because they work the Requests. What in Scrapy could be breaking this?

The following code was run inside scrapy shell:

url = 'https://www.tripadvisor.co.uk/ShowUserReviews-g2151208-d19219570-r792748373-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html'
headers = {
    'authority': 'www.tripadvisor.co.uk',
    'method': 'POST',
    'scheme': 'https',
    'accept': 'text/html, */*',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.9',
    'cache-control': 'no-cache',
    'content-length': '102',
    'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'dnt': '1',
    'origin': 'https://www.tripadvisor.co.uk',
    'pragma': 'no-cache',
    'sec-ch-ua-mobile': '?0',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36',
    'x-requested-with': 'XMLHttpRequest',
}
params = {
    'returnTo': '#REVIEWS',
    'filterLang': 'ALL',
    'changeSet': 'REVIEW_LIST'
}

Scrapy FormRequst returns a 400 error.

In [10]: req = scrapy.http.FormRequest(
    ...:             url,
    ...:             method='POST',
    ...:             formdata=params,
    ...:             headers=headers)

In [11]: fetch(req)
2021-06-26 21:28:18 [scrapy.core.engine] DEBUG: Crawled (400) <POST https://www.tripadvisor.co.uk/ShowUserReviews-g2151208-d19219570-r792748373-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html> (referer: None)

Python Requests returns a 200 and I can access the content.

In [17]: r = requests.post(url=url, headers=headers, json=params)
2021-06-26 21:30:02 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): www.tripadvisor.co.uk:443
2021-06-26 21:30:04 [urllib3.connectionpool] DEBUG: https://www.tripadvisor.co.uk:443 "POST /ShowUserReviews-g2151208-d19219570-r792748373-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html HTTP/1.1" 200 16360

In [18]: r.status_code
Out[18]: 200

Solution

As I can't access the url from here,you may try following code whether it works or not.You also have to add user-agent.

import scrapy

class ReviewsSpider(scrapy.Spider):
    name = 'reviews' 
    body = "reqNum=1&isLastPoll=false&paramSeqId=0&waitTime=41&changeSet=REVIEW_LIST&puid=YNgN2QokGScAA0-MH9MAAAIQ"
    def start_requests(self):
        yield scrapy.Request(
            url = 'https://www.tripadvisor.co.uk/ShowUserReviews-g2151208-d19219570-r791416821-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html',
            method = "POST",
            body = self.body,
            callback = self.parse,
            headers = {
                'content-type': 'application/x-www-form-urlencoded',
                'x-puid': 'YNgN2QokGScAA0-MH9MAAAIQ',
                'x-requested-with': 'XMLHttpRequest'
               
            }
        )
    def parse(self, response):
        pass

Answered By - F.Hoque

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, January 2, 2022

[FIXED] Scrapy FormRequest returning 400 error while Python Requests works

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels