Friday, October 8, 2021

[FIXED] How can I assign a variable from column 2 when running a loop of values in column 1 (same ROW value)

October 08, 2021 bigcommerce, pandas, python, scrapy No comments

Issue

I will explain the goal in more detail, The point of the script is to check (product code)values in column A on a supplier website, if the product is available, the loop checks the next value.

If the product is not on the site, a JSON PUT request is sent to a different sales website that sets the inventory level at 0.

The issue is how to assign the value in column B of the same CSV file to the PUT request

CSV file

COL-A  COL-B
aaaaa  111
bbbbb  222
ccccc  333

This is the code I have so far

import scrapy
from scrapy.http import FormRequest
from scrapy.http import JsonRequest
from scrapy.http import Request
import pandas as pd
import requests
import bigcommerce

api = bigcommerce.api.BigcommerceApi(client_id='xxxxx', store_hash='zzzzz', access_token='11111')


def readcsv():
    df = pd.read_csv('data.csv')
    return df['COL-A'].values.tolist()


class datacheckSpider(scrapy.Spider):
    name = 'datacheck'
    start_urls = ['http://www.example.com/order/']


    def parse(self, response):
        for COL-A in readcsv():
            base_url = 'http://www.example.com/order/item={}'
            yield scrapy.Request(base_url.format(COL-A), callback=self.data)

    def data(self, response):
        if not response.xpath('//body[1]/div[1]/div[1]/div[4]/ul[1]/li[1]/div[1]/div[1]/div[1]/div[2]/div[2]/p[1]/text()').get():
            yield{
            api.Products.get(%SET_COL-B_VARIABLE_HERE%).update(inventory_level='0')}

The code works if you manually set the product id from COL-B in the PUT request, however I tried to define the variable the same as COL-A and it did not work

The issue is that I need the script to know when checking the current loop value number, the value from the same row in the CSV file is required, it seems df.loc might work but I am not sure how to align the values

If you have any ideas please let me know of a way to resolve this as I am a beginner to Scrapy, Pandas and Python in general and would like to learn.

Solution

From scrapy’s documentation Passing additional data to callback functions, you basically want to pass the code to the data callback in Request’s cb_kwargs argument,

To get all codes, you could iterate on (COL-A, COL-B) pairs, not simply on COL-A values. Here we return the 2d numpy array, thus the list of rows, where each row is the COL-A, COL-B pair:

def readcsv():
    df = pd.read_csv('data.csv')
    return df.values

Then in parse you can iterate on these pairs and pass them on to the next callback:

class datacheckSpider(scrapy.Spider):
    name = 'datacheck'
    start_urls = ['http://www.example.com/order/']

    def parse(self, response):
        base_url = 'http://www.example.com/order/item={}'
        for product_code, product_key in readcsv():
            scrapy.Request(base_url.format(product_code), callback=self.data, cb_kwargs={'product_key': product_key})

    def data(self, response, product_key):
        if not response.xpath('//body[1]/div[1]/div[1]/div[4]/ul[1]/li[1]/div[1]/div[1]/div[1]/div[2]/div[2]/p[1]/text()').get():
            yield api.Products.get(product_key).update(inventory_level='0')

Answered By - Cimbali

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, October 8, 2021

[FIXED] How can I assign a variable from column 2 when running a loop of values in column 1 (same ROW value)

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels