Monday, September 5, 2022

[FIXED] url returns JSON but with scrapy I got a weird response

September 05, 2022 json, python, scrapy No comments

Issue

I am new to scrapy and I struggle understanding the response I get from a simple address. The address is https://fr.getaround.com/search.json?address=Gare de Bordeaux Saint-Jean which give a long json response (>130k caracters).

The idea with this json is to then scrape the list of cars provided by the response.

getaround api is quite standard in its answers so even if there were no cars, I would still receive the global json structure with an empty cars list.

When trying with scrapy though I get a very short response : b'{"redirect_to":"/"}'

Here under is the code I am using

def start_requests(self):
    addresses= ["Gare de Bordeaux Saint-Jean"]       
    
    for address in addresses:
        yield scrapy.Request(
            f"https://fr.getaround.com/search.json?address={address}"
        )

def parse(self, response):
    print("--------------------------------------------------------\nRESPONSE\n--------------------------------------------------------")
    print(response)
    print("--------------------------------------------------------\nBODY\n--------------------------------------------------------")
    print(response.body)

I tried a few things :

Using playwright

It basically wrap the previous response.body between some html tags

Using the shell

Same response. I tried to force the method to GET (request = request.replace(method="GET")) or POST (method="POST")

GET gives a 200 code with proper response in POSTMAN and 200 status with only a body being b'' with scrapy shell
POST gives a 404 code in both POSTMAN and scrapy

I tried enabling or not cookies with settings.py with no luck.
I tried to scrape the main page (fr.getaround.com) out of which the response.body seems fine.

Any idea on what I am doing wrong ?

EDIT

Here the json response I get from POSTMAN / opening the url

Solution

So the difference between the request opened in Chrome / Postman and what scrapy was doing is a simple matter of cookie. In my case POSTMAN add some saved cookies (maybe from an initial query) which allowed getaround to still provide an answer to a badly formated url based on the request out of which the cookies were generated

So the issue is not from scrapy but from non discarded cookies within Postman (the little "Cookies" link just under the Send button) that made me believe my GET requests were correct.

Answered By - samuel guedon

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, September 5, 2022

[FIXED] url returns JSON but with scrapy I got a weird response

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels