Saturday, February 12, 2022

[FIXED] What is wrong with my spider? It starts and ends within seconds, creating an empty file

February 12, 2022 python, python-3.x, scrapy, web-scraping No comments

Issue

import scrapy
from scrapy.loader import ItemLoader

class Summoner(scrapy.Item):
    summ = scrapy.Field()
    rank = scrapy.Field()

class myspider(scrapy.Spider):
    name = 'spider1'
    start_urls = ['https://www.op.gg/ranking/ladder/']
    def parse(self, response):
        sel = scrapy.Selector(response)
        summoners = sel.xpath('//ul[@class="ranking-highest__list"]/ul')

        for ran, summon in enumerate(summoners):
            item = ItemLoader(Summoner(), summon)
            item.add_xpath('summ', './/li/a/text()')
            item.add_value('rank', ran)
            yield item.load_item()

My intention is getting every summoners name from the leaderboard along with its rank (through enumerate).

I then run scrapy runspider myscript.py -o results.xml and spider stops out leaving a 0 byte .xml file.

No errors shown.

I have tried changing xpath from summoners multiple times without any success.

Also, an additional question: Am I supposed to 'figure' xpath by myself like I attempted above, or I should just copy it from Inspect element? Doing so, I get something like this /html/body/div[2]/div[3]/div[3]/div/div/div/div[1]/ul (which still doesn't work btw)

I'm sure my problem lays in xpath, may you correct me?

Solution

Your xpaths were wrong. Try the following instead:

import scrapy
from scrapy.loader import ItemLoader

class Summoner(scrapy.Item):
    summ = scrapy.Field()
    rank = scrapy.Field()

class myspider(scrapy.Spider):
    name = 'spider1'
    start_urls = ['https://www.op.gg/ranking/ladder/']
    
    def parse(self, response):
        for summon in response.xpath('//ul[@class="ranking-highest__list"]/li[contains(@id,"summoner-")]'):
            item = ItemLoader(Summoner(), summon)
            item.add_xpath('summ', './/a[@class="ranking-highest__name"]/text()')
            item.add_xpath('rank', './/*[@class="ranking-highest__rank"]/text()')
            yield item.load_item()

Answered By - SMTH

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, February 12, 2022

[FIXED] What is wrong with my spider? It starts and ends within seconds, creating an empty file

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels