Issue
I am new to scrapy and I struggle understanding the response I get from a simple address. The address is https://fr.getaround.com/search.json?address=Gare de Bordeaux Saint-Jean which give a long json response (>130k caracters).
The idea with this json is to then scrape the list of cars provided by the response.
getaround api is quite standard in its answers so even if there were no cars, I would still receive the global json structure with an empty cars list.
When trying with scrapy though I get a very short response : b'{"redirect_to":"/"}'
Here under is the code I am using
def start_requests(self):
addresses= ["Gare de Bordeaux Saint-Jean"]
for address in addresses:
yield scrapy.Request(
f"https://fr.getaround.com/search.json?address={address}"
)
def parse(self, response):
print("--------------------------------------------------------\nRESPONSE\n--------------------------------------------------------")
print(response)
print("--------------------------------------------------------\nBODY\n--------------------------------------------------------")
print(response.body)
I tried a few things :
- Using playwright
It basically wrap the previous response.body between some html tags
- Using the shell
Same response. I tried to force the method to GET (request = request.replace(method="GET")
) or POST (method="POST"
)
- GET gives a 200 code with proper response in POSTMAN and 200 status with only a body being
b''
with scrapy shell - POST gives a 404 code in both POSTMAN and scrapy
- I tried enabling or not cookies with settings.py with no luck.
- I tried to scrape the main page (
fr.getaround.com
) out of which the response.body seems fine.
Any idea on what I am doing wrong ?
EDIT
Here the json response I get from POSTMAN / opening the url
Solution
So the difference between the request opened in Chrome / Postman and what scrapy was doing is a simple matter of cookie. In my case POSTMAN add some saved cookies (maybe from an initial query) which allowed getaround to still provide an answer to a badly formated url based on the request out of which the cookies were generated
So the issue is not from scrapy but from non discarded cookies within Postman (the little "Cookies" link just under the Send button) that made me believe my GET requests were correct.
Answered By - samuel guedon
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.