Issue
I will explain the goal in more detail, The point of the script is to check (product code)values in column A on a supplier website, if the product is available, the loop checks the next value.
If the product is not on the site, a JSON PUT request is sent to a different sales website that sets the inventory level at 0.
The issue is how to assign the value in column B of the same CSV file to the PUT request
CSV file
COL-A COL-B
aaaaa 111
bbbbb 222
ccccc 333
This is the code I have so far
import scrapy
from scrapy.http import FormRequest
from scrapy.http import JsonRequest
from scrapy.http import Request
import pandas as pd
import requests
import bigcommerce
api = bigcommerce.api.BigcommerceApi(client_id='xxxxx', store_hash='zzzzz', access_token='11111')
def readcsv():
df = pd.read_csv('data.csv')
return df['COL-A'].values.tolist()
class datacheckSpider(scrapy.Spider):
name = 'datacheck'
start_urls = ['http://www.example.com/order/']
def parse(self, response):
for COL-A in readcsv():
base_url = 'http://www.example.com/order/item={}'
yield scrapy.Request(base_url.format(COL-A), callback=self.data)
def data(self, response):
if not response.xpath('//body[1]/div[1]/div[1]/div[4]/ul[1]/li[1]/div[1]/div[1]/div[1]/div[2]/div[2]/p[1]/text()').get():
yield{
api.Products.get(%SET_COL-B_VARIABLE_HERE%).update(inventory_level='0')}
The code works if you manually set the product id from COL-B in the PUT request, however I tried to define the variable the same as COL-A and it did not work
The issue is that I need the script to know when checking the current loop value number, the value from the same row in the CSV file is required, it seems df.loc might work but I am not sure how to align the values
If you have any ideas please let me know of a way to resolve this as I am a beginner to Scrapy, Pandas and Python in general and would like to learn.
Solution
From scrapy’s documentation Passing additional data to callback functions, you basically want to pass the code to the data
callback in Request’s cb_kwargs
argument,
To get all codes, you could iterate on (COL-A, COL-B) pairs, not simply on COL-A values. Here we return the 2d numpy array, thus the list of rows, where each row is the COL-A
, COL-B
pair:
def readcsv():
df = pd.read_csv('data.csv')
return df.values
Then in parse you can iterate on these pairs and pass them on to the next callback:
class datacheckSpider(scrapy.Spider):
name = 'datacheck'
start_urls = ['http://www.example.com/order/']
def parse(self, response):
base_url = 'http://www.example.com/order/item={}'
for product_code, product_key in readcsv():
scrapy.Request(base_url.format(product_code), callback=self.data, cb_kwargs={'product_key': product_key})
def data(self, response, product_key):
if not response.xpath('//body[1]/div[1]/div[1]/div[4]/ul[1]/li[1]/div[1]/div[1]/div[1]/div[2]/div[2]/p[1]/text()').get():
yield api.Products.get(product_key).update(inventory_level='0')
Answered By - Cimbali
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.