Saturday, May 14, 2022

[FIXED] Unsure how to query hidden api in dev tools>network>xhr

May 14, 2022 elasticsearch, json, python, scrapy, web-scraping No comments

Issue

I have been trying to extract data from this website: https://www.webuycars.co.za/buy-a-car. I have looked at the dev tools>network>xhr for the responses but I am trying to receive more data than from just the first page of results of vehicles. This is the code for it so far:

import json
import scrapy

class carSpider(scrapy.Spider):

    name = 'car'
    body = {"to":24,"size":24,"type":"All","filter_type":"all","subcategory":None,"q":"audi","Make":None,"Roadworthy":None,"Auctions":[],"Model":None,"Variant":None,"DealerKey":None,"FuelType":None,"BodyType":None,"Gearbox":None,"AxleConfiguration":None,"Colour":None,"FinanceGrade":None,"Priced_Amount_Gte":0,"Priced_Amount_Lte":0,"MonthlyInstallment_Amount_Gte":0,"MonthlyInstallment_Amount_Lte":0,"auctionDate":None,"auctionEndDate":None,"auctionDurationInSeconds":None,"Kilometers_Gte":0,"Kilometers_Lte":0,"Priced_Amount_Sort":"","Bid_Amount_Sort":"","Kilometers_Sort":"","Year_Sort":"","Auction_Date_Sort":"","Auction_Lot_Sort":"","Year":[],"Price_Update_Date_Sort":"","Online_Auction_Date_Sort":"","Online_Auction_In_Progress":""}
    

    def start_requests(self):
        yield scrapy.Request(
            url='https://website-elastic-api.webuycars.co.za/api/search',
            callback=self.parse,
            body=json.dumps(self.body),
            method="POST")
            


    def parse(self, response):
        response = json.loads(response.body)

        for resp in response['data']:
            yield {
                'Title': resp['OnlineDescription']
            }

This is the data I receive:

2022-05-04 22:21:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2020 Nissan Almera 1.5 Acenta Auto'}
2022-05-04 22:21:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Citroen C3 Aircross 1.2T Puretech Sine Auto'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Toyota Hilux 2.4 Gd-6 RB Raider Pick Up Double Cab'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2013 Hyundai i10 1.25 Gls/fluid Auto'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2019 SYM Symphony JET 14 200'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2019 Nissan Micra 1.2 Active Visia'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2021 Suzuki Super Carry 1.2i Pick Up Single Cab'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Suzuki AN UB 125 (burgman)'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Honda XRL XR 125l'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Toyota Hilux 2.4 Gd-6 RB Raider Pick Up Double Cab'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Land Rover Defender 110 D300 SE X-Dynamic (221 KW)'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Big Boy TSR 250'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2019 Renault Kwid 1.0 Dynamique 5-Door'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2013 Tata Indigo 1.4 Manza Ignis'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2018 Datsun GO 1.2 LUX'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2021 Renault Kiger 1.0 Energy ZEN'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2020 Crosby Adventure Bike 400cc'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2012 Jeep Compass 2.0 LTD'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2021 Crosby Adventure Bike 400cc'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Renault Kwid 1.0 Climber 5-Door'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2019 Suzuki Swift 1.2 GLX'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Volkswagen Polo Classic GP 1.4 Comfortline'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2020 Renault Kwid 1.0 Climber 5-Door Auto'}
2022-05-04 22:21:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2019 Yamaha YZ 450 FX'}

However I have been trying to query audi as seen in line 7 of the code in the body variable("q":"audi"). No matter what I try, I keep receiving the same data back which is all from the first page. I assume there is something wrong with how I parse my body in scrapy.Request but I am unsure. I have tried different formats of the body like making it a string, directly copy and pasting the payload from the response and also not parsing it as json formatted but rather a string. Any help would be appreciated.

Solution

Actually,you want to get data from more than one page meaning want to make pagination, If so, then you can follow the next working solution.

import json
import scrapy
from scrapy.crawler import CrawlerProcess

class CarsSpider(scrapy.Spider):
    name = 'car'
    body = {"to":24,"size":24,"type":"All","filter_type":"all","subcategory":None,"q":"","Make":None,"Roadworthy":None,"Auctions":[],"Model":None,"Variant":None,"DealerKey":None,"FuelType":None,"BodyType":None,"Gearbox":None,"AxleConfiguration":None,"Colour":None,"FinanceGrade":None,"Priced_Amount_Gte":0,"Priced_Amount_Lte":0,"MonthlyInstallment_Amount_Gte":0,"MonthlyInstallment_Amount_Lte":0,"auctionDate":None,"auctionEndDate":None,"auctionDurationInSeconds":None,"Kilometers_Gte":0,"Kilometers_Lte":0,"Priced_Amount_Sort":"","Bid_Amount_Sort":"","Kilometers_Sort":"","Year_Sort":"","Auction_Date_Sort":"","Auction_Lot_Sort":"","Year":[],"Price_Update_Date_Sort":"","Online_Auction_Date_Sort":"","Online_Auction_In_Progress":""}

    def start_requests(self):
       
        yield scrapy.Request(
            url='https://website-elastic-api.webuycars.co.za/api/search',
            callback=self.parse,
            body=json.dumps(self.body),
            method="POST",
            headers= {
                "content-type": "application/json",
                "User-Agent":"mozilla/5.0"
                }
        )

    def parse(self, response):
        response = json.loads(response.body)
        for item in range(0,6528,24):
            response['total']['value']=item
       
            for resp in response['data']:
                yield {
                    'Title': resp['OnlineDescription']
                }

if __name__ == "__main__":
    process =CrawlerProcess()
    process.crawl()
    process.start()

Output:

'downloader/response_status_count/200': 1,

 'item_scraped_count': 6528,

screenshot_pagination_source

Answered By - F.Hoque

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, May 14, 2022

[FIXED] Unsure how to query hidden api in dev tools>network>xhr

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels