Issue
Sending a Post
request through Scrapy FormRequest
results in a 400 error while the same request made through Python Requests is successful.
The request headers
and params
can't be the problem because they work the Requests. What in Scrapy could be breaking this?
The following code was run inside scrapy shell:
url = 'https://www.tripadvisor.co.uk/ShowUserReviews-g2151208-d19219570-r792748373-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html'
headers = {
'authority': 'www.tripadvisor.co.uk',
'method': 'POST',
'scheme': 'https',
'accept': 'text/html, */*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'cache-control': 'no-cache',
'content-length': '102',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'dnt': '1',
'origin': 'https://www.tripadvisor.co.uk',
'pragma': 'no-cache',
'sec-ch-ua-mobile': '?0',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36',
'x-requested-with': 'XMLHttpRequest',
}
params = {
'returnTo': '#REVIEWS',
'filterLang': 'ALL',
'changeSet': 'REVIEW_LIST'
}
Scrapy FormRequst
returns a 400 error.
In [10]: req = scrapy.http.FormRequest(
...: url,
...: method='POST',
...: formdata=params,
...: headers=headers)
In [11]: fetch(req)
2021-06-26 21:28:18 [scrapy.core.engine] DEBUG: Crawled (400) <POST https://www.tripadvisor.co.uk/ShowUserReviews-g2151208-d19219570-r792748373-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html> (referer: None)
Python Requests returns a 200 and I can access the content.
In [17]: r = requests.post(url=url, headers=headers, json=params)
2021-06-26 21:30:02 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): www.tripadvisor.co.uk:443
2021-06-26 21:30:04 [urllib3.connectionpool] DEBUG: https://www.tripadvisor.co.uk:443 "POST /ShowUserReviews-g2151208-d19219570-r792748373-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html HTTP/1.1" 200 16360
In [18]: r.status_code
Out[18]: 200
Solution
As I can't access the url from here,you may try following code whether it works or not.You also have to add user-agent.
import scrapy
class ReviewsSpider(scrapy.Spider):
name = 'reviews'
body = "reqNum=1&isLastPoll=false¶mSeqId=0&waitTime=41&changeSet=REVIEW_LIST&puid=YNgN2QokGScAA0-MH9MAAAIQ"
def start_requests(self):
yield scrapy.Request(
url = 'https://www.tripadvisor.co.uk/ShowUserReviews-g2151208-d19219570-r791416821-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html',
method = "POST",
body = self.body,
callback = self.parse,
headers = {
'content-type': 'application/x-www-form-urlencoded',
'x-puid': 'YNgN2QokGScAA0-MH9MAAAIQ',
'x-requested-with': 'XMLHttpRequest'
}
)
def parse(self, response):
pass
Answered By - F.Hoque
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.