Monday, November 22, 2021

[FIXED] How to send duplicates of some lines in body of POST request when I use Scrapy?

November 22, 2021 python, scrapy, web-scraping No comments

Issue

For scraping one site I have to send duplicats of lines to get json data. I tested this method with requests. But it don't works when I use Scrapy. There are not duplicates in body of request:

class MainSpider(scrapy.Spider):
    name = 'main'
    allowed_domains = ['ukonlinestores.co.uk']
    # start_urls = ['https://ukonlinestores.co.uk/amazon-uk-sellers/']
    search_url = 'https://ukonlinestores.co.uk/wp-admin/admin-ajax.php?action=get_wdtable&table_id=9'
    handle_httpstatus_list = [400]

    def parse_search(self, response):
        inspect_response(response, self)

    def start_requests(self):
        data = {
            'draw': '2',
            'columns[0][data]': '0',
            'columns[0][name]': 'wdt_ID',
            'columns[0][searchable]': 'true',
            'columns[0][orderable]': 'true',
            'columns[0][orderable]': 'true',
            'columns[0][search][value]': '',
            'columns[0][search][value]': '',
            'columns[0][search][regex]': 'false',
            'columns[0][search][regex]': 'false',
            'columns[1][data]': '1',
            'columns[1][data]': '1',
            'columns[1][name]': 'sellerid',
            'columns[1][name]': 'sellerid',
            'columns[1][searchable]': 'true',
            'columns[1][searchable]': 'true',
            'columns[1][orderable]': 'true',
            'columns[1][orderable]': 'true',
    }
        yield scrapy.Request(
                             self.search_url, 
                             callback=self.parse_search, 
                             method='POST', 
                             headers=headers, 
                             body=json.dumps(data))

>>> request.body
b'{"columns[0][data]": "0", "columns[0][name]": "wdt_ID", "columns[0][orderable]": "true", "columns[0][search][regex]": "false", "columns[0][search][value]": "", "co
lumns[0][searchable]": "true", "columns[10][data]": "10", "columns[10][name]": "positive12months", "columns[10][orderable]": "true", "columns[10][search][regex]": "f
alse", "columns[10][search][value]": "", "columns[10][searchable]": "true", "columns[11][data]": "11", "columns[11][name]": "positivelifetime", "columns[11][orderabl
e]": "true", "columns[11][search][regex]": "false", "columns[11][search][value]": "", "columns[11][searchable]": "true", "columns[12][data]": "12", "columns[12][name
]": "count30day", "columns[12][orderable]": "true", "columns[12][search][regex]": "false", "columns[12][search][value]": "", "columns[12][searchable]": "true", "colu
mns[13][data]": "13", "columns[13][name]": "count90day", "columns[13][orderable]": "true",

how can I bypass this feature?

Solution

Here's how you can get the data via requests. You have to reverse engineer the HTTP requests. To gain access to the https://ukonlinestores.co.uk/wp-admin/admin-ajax.php, you have to recreate a POST HTTP request, using either nothing than a request, are you have to include parameters, cookies, headers. I tend to start with a simple request and build up, here I didn't need the headers, but the params and data are necessary to get the JSON data you require here.

I tend to use chrometools and copy the request into http://curl.trillworks.com. That way I can get a nicely formatted headers, cookies and params.

You could also use the same params and data in a scrapy script also.

Note looking at the data payload, you hadn't included a lot of it... which is probably why you weren't get the response you needed. Here's an example of using requests to do it.

Code Example

import requests

params = (
    ('action', 'get_wdtable'),
    ('table_id', '25'),
)

data = {
  'draw': '1',
  'columns[0][data]': '0',
  'columns[0][name]': 'storeurl',
  'columns[0][searchable]': 'true',
  'columns[0][orderable]': 'true',
  'columns[0][search][value]': '',
  'columns[0][search][regex]': 'false',
  'columns[1][data]': '1',
  'columns[1][name]': 'positivefeedback',
  'columns[1][searchable]': 'true',
  'columns[1][orderable]': 'true',
  'columns[1][search][value]': '',
  'columns[1][search][regex]': 'false',
  'columns[2][data]': '2',
  'columns[2][name]': 'rank',
  'columns[2][searchable]': 'true',
  'columns[2][orderable]': 'true',
  'columns[2][search][value]': '',
  'columns[2][search][regex]': 'false',
  'columns[3][data]': '3',
  'columns[3][name]': 'storemarketplace',
  'columns[3][searchable]': 'true',
  'columns[3][orderable]': 'true',
  'columns[3][search][value]': '',
  'columns[3][search][regex]': 'false',
  'columns[4][data]': '4',
  'columns[4][name]': 'maincategory',
  'columns[4][searchable]': 'true',
  'columns[4][orderable]': 'true',
  'columns[4][search][value]': '',
  'columns[4][search][regex]': 'false',
  'columns[5][data]': '5',
  'columns[5][name]': 'noofproducts',
  'columns[5][searchable]': 'true',
  'columns[5][orderable]': 'true',
  'columns[5][search][value]': '',
  'columns[5][search][regex]': 'false',
  'columns[6][data]': '6',
  'columns[6][name]': 'fulfilmenttype',
  'columns[6][searchable]': 'true',
  'columns[6][orderable]': 'true',
  'columns[6][search][value]': '',
  'columns[6][search][regex]': 'false',
  'columns[7][data]': '7',
  'columns[7][name]': 'countlifetime',
  'columns[7][searchable]': 'true',
  'columns[7][orderable]': 'true',
  'columns[7][search][value]': '',
  'columns[7][search][regex]': 'false',
  'order[0][column]': '2',
  'order[0][dir]': 'asc',
  'start': '0',
  'length': '50',
  'search[value]': '',
  'search[regex]': 'false',
  'wdtNonce': '78ce0f8f66'
}

response = requests.post('https://ukonlinestores.co.uk/wp-admin/admin-ajax.php', headers=headers, params=params, data=data)

data = response.json()

Answered By - AaronS

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, November 22, 2021

[FIXED] How to send duplicates of some lines in body of POST request when I use Scrapy?

Issue

Solution

Code Example

0 comments:

Post a Comment

Popular Posts

Labels