Tuesday, November 30, 2021

[FIXED] Scrapy: Maintain location cookie for redirects

November 30, 2021 python, request, scrapy No comments

Issue

Code:

# -*- coding: utf-8 -*-
import scrapy
from ..items import LowesspiderItem
from scrapy.http import Request

class LowesSpider(scrapy.Spider):
    name = 'lowes'

    def start_requests(self):
        start_urls = ['https://www.lowes.com/search?searchTerm=8654RM-42']

        for url in start_urls:
            yield Request(url, cookies={'sn':'2333'}) #Added cookie to bypass location req 

    def parse(self, response):
        items = response.css('.grid-container')
        for product in items:
            item = LowesspiderItem()

        #get product price
            productPrice = product.css('.art-pd-price::text').get()
        #get lowesNum
            productLowesNum = response.url.split("/")[-1]
        #get SKU
            productSKU = product.css('.met-product-model::text').get()

            item["productLowesNum"] = productLowesNum
            item["productSKU"] = productSKU
            item["productPrice"] = productPrice


            yield item

Output:

{'productLowesNum': '1001440644',
 'productPrice': None,
 'productSKU': '8654RM-42'}

Now, I'll have a list of SKU's so that's how I'm going to format start_urls, so,

start_urls = ['https://www.lowes.com/search?searchTerm=('some sku)']

This url would redirect me to this link: https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Ducted-Red-Matte-Wall-Mounted-Range-Hood-Common-42-Inch-Actual-42-in/1001440644

That's handled by scrapy

Now the problem

When I have:

start_urls = ['https://www.lowes.com/search?searchTerm=8654RM-42']

I get the SKU but not the price.

However when I use the actual URL in start_urls

start_urls = ['https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Ducted-Red-Matte-Wall-Mounted-Range-Hood-Common-42-Inch-Actual-42-in/1001440644']

then my output is fine:

{'productLowesNum': '1001440644',
 'productPrice': '1,449.95',
 'productSKU': '8654RM-42'}

So, I believe using a URL that has to be redirected causes for my scraper to not get the price for some reason, but I still get the SKU.

Here's my guess: I had to preset a location cookie because the Lowes website does not allow you to see the price unless the user gives them a zip code/ location. so I'd assume I would have to move or adjust cookies={'sn':'2333'} to make my program work as expected.

Solution

Problem

The main issue here is that some of your cookies which are set by the first request

https://www.lowes.com/search?searchTerm=8654RM-42

are carried forward to the request after the redirect which is

https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Ducted-Red-Matte-Wall-Mounted-Range-Hood-Common-42-Inch-Actual-42-in/1001440644

These cookies are overriding the cookies set by you.

Solution

You need to send explict cookies to each request and prevent the previous cookies from being added to the next request.

There is a setting in scrapy called dont_merge_cookies which is used for this purpose. You need to set this setting in your request meta to prevent cookies from previous requests being appended to the next request.

Now you need to explicitly set the cookies in request header. Something like this:

def start_requests(self):
    start_urls = ['https://www.lowes.com/search?searchTerm=8654RM-42']

    for url in start_urls:
        yield Request(url, headers={'Cookie': 'sn=2333;'}, meta={'dont_merge_cookies': True})

Hope it helps.

Answered By - asimhashmi

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, November 30, 2021

[FIXED] Scrapy: Maintain location cookie for redirects

Issue

Solution

Problem

Solution

0 comments:

Post a Comment

Popular Posts

Labels