Sunday, December 17, 2023

[FIXED] Scrapy returning single item from selector list

December 17, 2023 python, scrapy, web-scraping No comments

Issue

I am trying to get data of all the Amazon bestseller and to process that i have used the scrapy, i am able to get the whole selector list of data yet while iterating over the data list the result is still returning only single data item.

    def parse_page(self, response):

        product_data = response.xpath("//div[@id='gridItemRoot']") #THIS RETURNS A SELECTOR LIST

        for data in product_data:
            product_name = data.xpath("//div[@class='a-section a-spacing-mini _cDEzb_noop_3Xbw5']//img/@alt").get()
            product_rank = data.xpath("//span[@class='zg-bdg-text']/text()").get()
            
         # It only generates a single result
            yield {
                "name": product_name,
                "rank": product_rank
            }

i tried without iterating over a selectorlist rather passing selector directly to the method and yielding result but that also returned a single element.

    def parse_page(self, response):
   
      
   # in previous applications all the results were scraped without iterating over any selectorlist just like following

        product_name = response.xpath("//div[@class='a-section a-spacing-mini _cDEzb_noop_3Xbw5']//img/@alt").get()
        product_rank = response.xpath("//span[@class='zg-bdg-text']/text()").get()
       
 
        yield {
            "name": product_name,
            "rank": product_rank
        }

Solution

You need to use relative xpath expressions.

    def parse_page(self, response):

        product_data = response.xpath("//div[@id='gridItemRoot']") #THIS RETURNS A SELECTOR LIST

        for data in product_data:
            product_name = data.xpath(".//div[@class='a-section a-spacing-mini _cDEzb_noop_3Xbw5']//img/@alt").get()
            product_rank = data.xpath(".//span[@class='zg-bdg-text']/text()").get()
            
         # It only generates a single result
            yield {
                "name": product_name,
                "rank": product_rank
            }

Without the . at the beginning of the xpath expression it will always grab the first match relative to the root element, which will always be the same element for every iteration.

Answered By - Alexander

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, December 17, 2023

[FIXED] Scrapy returning single item from selector list

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels