Issue
I'm trying to scrape sofifa.com with scrapy tool. With the code below, I'm trying to scrape the full name and rating for the 60 players only exist in the first page, but I got more than 60 and the spider doesn't stop unless I stop it.
I noticed that many players scraped don't exist in the first page, also it's trying to scrape data about team which I didn't provide.
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
import time
from lxml import etree
# from scrapy_cloudflare_middleware.middlewares import CloudFlareMiddleware
class PlayersSpider(CrawlSpider):
name = "players"
allowed_domains = ["sofifa.com"]
# start_urls = ['https://sofifa.com']
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36'
def start_requests(self):
yield scrapy.Request(url= 'https://sofifa.com', headers= {'User-Agent': self.user_agent})
rules = (
Rule(LinkExtractor(restrict_xpaths= ('//table//tbody//tr/td[2]/a')[:60]), callback="parse_item", follow=True),
)
# def set_user_agent(self, request, ay7aga):
# request.headers['User-Agent'] = self.user_agent
# return request
def parse_item(self, response):
time.sleep(1)
# print(response.status)
if '/player' in response.url:
yield {
'full_name': response.xpath('//div[@class="profile clearfix"]/h1/text()').get(),
'overall_rating': response.xpath('//div[@class="grid"]//em[1]/text()').get()
# 'potential': response.xpath('.//div[@class="grid"]//em[2]/text()').get(),
# 'value': response.xpath('.//div[@class="grid"]//em[3]/text()').get(),
# 'wage': response.xpath('.//div[@class="grid"]//em[4]/text()').get()
}
else:
pass
Solution
All you actually need to do is either set follow
to False
in your Rule
constructor, or remove the parameter completely since it defaults to False
when there is a callback already set.
According to the scrapy docs
follow
is a boolean which specifies if links should be followed from each response extracted with this rule. If callback is None follow defaults to True, otherwise it defaults to False.
So by turning off follow you ensure that only the responses that are intially generated by the link extractor are sent to the parse_item
callback and no additional links are followed on the subsequent pages.
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
import time
class PlayersSpider(CrawlSpider):
name = "players"
allowed_domains = ["sofifa.com"]
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36'
def start_requests(self):
yield scrapy.Request(url= 'https://sofifa.com', headers= {'User-Agent': self.user_agent})
rules = (
Rule(LinkExtractor(restrict_xpaths= ('//table//tbody//tr/td[2]/a')[:60]), callback="parse_item"),
)
def parse_item(self, response):
if '/player' in response.url:
yield {
'full_name': response.xpath('//div[@class="profile clearfix"]/h1/text()').get(),
'overall_rating': response.xpath('//div[@class="grid"]//em[1]/text()').get()
}
After running this code with scrapy crawl players -o players.json
I got exactly 60 results in my json file and it produced the following output.
OUTPUT
2024-02-01 18:51:25 [scrapy.utils.log] INFO: Scrapy 2.11.0 started (bot: spiders)
2024-02-01 18:51:25 [scrapy.utils.log] INFO: Versions: lxml 5.1.0.0, libxml2 2.10.3, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.2, Twisted 22.10.0, Python 3.11.
7 (tags/v3.11.7:fa7a6f2, Dec 4 2023, 19:24:49) [MSC v.1937 64 bit (AMD64)], pyOpenSSL 24.0.0 (OpenSSL 3.2.1 30 Jan 2024), cryptography 42.0.2, Platform Windo
ws-10-10.0.22621-SP0
2024-02-01 18:51:25 [scrapy.addons] INFO: Enabled addons:
[]
2024-02-01 18:51:25 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2024-02-01 18:51:25 [scrapy.extensions.telnet] INFO: Telnet Password: 2a289292fd307038
2024-02-01 18:51:25 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2024-02-01 18:51:25 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'spiders',
'NEWSPIDER_MODULE': 'spiders.spiders',
'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7',
'SPIDER_MODULES': ['spiders.spiders'],
'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
'(KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36'}
2024-02-01 18:51:25 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2024-02-01 18:51:25 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2024-02-01 18:51:25 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2024-02-01 18:51:25 [scrapy.core.engine] INFO: Spider opened
2024-02-01 18:51:25 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2024-02-01 18:51:25 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com> (referer: None)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/246191/julian-alvarez/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/247635/khvicha-kvaratskhelia/240024/> (referer: https://sofifa.co
m)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/237086/min-jae-kim/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/245371/thiago-almada/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/246147/mason-greenwood/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/245152/santiago-gimenez/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/266253/ivan-fresneda-corraliza/240024/> (referer: https://sofifa.
com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/268421/mathys-tel/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/246191/julian-alvarez/240024/>
{'full_name': 'Julián Álvarez', 'overall_rating': '81'}
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/247635/khvicha-kvaratskhelia/240024/>
{'full_name': 'Khvicha Kvaratskhelia', 'overall_rating': '86'}
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/237086/min-jae-kim/240024/>
{'full_name': '김민재 金敏在', 'overall_rating': '84'}
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/245371/thiago-almada/240024/>
{'full_name': 'Thiago Ezequiel Almada', 'overall_rating': '80'}
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/246147/mason-greenwood/240024/>
{'full_name': 'Mason Greenwood', 'overall_rating': '77'}
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/245152/santiago-gimenez/240024/>
{'full_name': 'Santiago Tomás Giménez', 'overall_rating': '80'}
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/270086/antonio-joao-tavares-silva/240024/> (referer: https://sofi
fa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/247679/victor-boniface/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/266253/ivan-fresneda-corraliza/240024/>
{'full_name': 'Iván Fresneda Corraliza', 'overall_rating': '72'}
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/268421/mathys-tel/240024/>
{'full_name': 'Mathys Tel', 'overall_rating': '74'}
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/236772/dominik-szoboszlai/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/264309/arda-guler/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/256630/florian-wirtz/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/256402/carlos-alcaraz/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/265600/roony-bardghji/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/269312/tommaso-baldanzi/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/270086/antonio-joao-tavares-silva/240024/>
{'full_name': 'António João Pereira Albuquerque Tavares Silva', 'overall_rating': '78'}
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/247679/victor-boniface/240024/>
{'full_name': 'Victor Okoh Boniface', 'overall_rating': '80'}
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/236772/dominik-szoboszlai/240024/>
{'full_name': 'Dominik Szoboszlai', 'overall_rating': '82'}
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/264309/arda-guler/240024/>
{'full_name': 'Arda Güler', 'overall_rating': '77'}
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/256630/florian-wirtz/240024/>
{'full_name': 'Florian Richard Wirtz', 'overall_rating': '86'}
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/256402/carlos-alcaraz/240024/>
{'full_name': 'Carlos Jonas Alcaraz', 'overall_rating': '73'}
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/265600/roony-bardghji/240024/>
{'full_name': 'Roony Bardghji', 'overall_rating': '70'}
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/259608/evan-ferguson/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/269312/tommaso-baldanzi/240024/>
{'full_name': 'Tommaso Baldanzi', 'overall_rating': '77'}
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/231747/kylian-mbappe/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/260815/arnau-martinez-lopez/240024/> (referer: https://sofifa.com
)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/243780/kang-in-lee/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/240833/youssoufa-moukoko/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/255565/kaoru-mitoma/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/259608/evan-ferguson/240024/>
{'full_name': 'Evan Ferguson', 'overall_rating': '74'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/231747/kylian-mbappe/240024/>
{'full_name': 'Kylian Mbappé Lottin', 'overall_rating': '91'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/260815/arnau-martinez-lopez/240024/>
{'full_name': 'Arnau Martínez López', 'overall_rating': '80'}
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/224232/nicolo-barella/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/231677/marcus-rashford/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/269859/arthur-vermeeren/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/243780/kang-in-lee/240024/>
{'full_name': '이강인 Kang In Lee', 'overall_rating': '78'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/240833/youssoufa-moukoko/240024/>
{'full_name': 'Youssoufa Moukoko', 'overall_rating': '77'}
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/259240/adam-wharton/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/255565/kaoru-mitoma/240024/>
{'full_name': '三笘 薫', 'overall_rating': '81'}
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/232293/victor-osimhen/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/257504/bilal-el-khannouss/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/262863/antonio-nusa/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/224232/nicolo-barella/240024/>
{'full_name': 'Nicolò Barella', 'overall_rating': '86'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/231677/marcus-rashford/240024/>
{'full_name': 'Marcus Rashford', 'overall_rating': '83'}
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/263620/romeo-lavia/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/269859/arthur-vermeeren/240024/>
{'full_name': 'Arthur Vermeeren', 'overall_rating': '76'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/259240/adam-wharton/240024/>
{'full_name': 'Adam Wharton', 'overall_rating': '71'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/232293/victor-osimhen/240024/>
{'full_name': 'Victor James Osimhen', 'overall_rating': '88'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/257504/bilal-el-khannouss/240024/>
{'full_name': 'Bilal El Khannouss', 'overall_rating': '73'}
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/252008/israel-reyes/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/248266/sacha-boey/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/258729/gabriel-veiga-novas/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/239085/erling-haaland/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/262863/antonio-nusa/240024/>
{'full_name': 'Antonio Eromonsele Nordby Nusa', 'overall_rating': '71'}
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/224949/javairo-dilrosun/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/263620/romeo-lavia/240024/>
{'full_name': 'Romeo Lavia', 'overall_rating': '73'}
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/245637/georginio-rutter/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/223689/wout-weghorst/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/252008/israel-reyes/240024/>
{'full_name': 'Israel Reyes Romero', 'overall_rating': '75'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/248266/sacha-boey/240024/>
{'full_name': 'Sacha Boey', 'overall_rating': '80'}
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/234569/florentino-morris-luis/240024/> (referer: https://sofifa.c
om)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/258729/gabriel-veiga-novas/240024/>
{'full_name': 'Gabriel Veiga Novas', 'overall_rating': '78'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/239085/erling-haaland/240024/>
{'full_name': 'Erling Braut Haaland', 'overall_rating': '91'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/224949/javairo-dilrosun/240024/>
{'full_name': 'Javairô Dilrosun', 'overall_rating': '72'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/245637/georginio-rutter/240024/>
{'full_name': 'Georginio Rutter', 'overall_rating': '74'}
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/250961/joshua-zirkzee/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/271575/simone-pafundi/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/237681/takefusa-kubo/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/229391/joao-maria-palhinha-goncalves/240024/> (referer: https://s
ofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/223689/wout-weghorst/240024/>
{'full_name': 'Wout Weghorst', 'overall_rating': '77'}
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/272978/jorrel-hato/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/234569/florentino-morris-luis/240024/>
{'full_name': 'Florentino Ibrain Morris Luís', 'overall_rating': '80'}
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/165153/karim-benzema/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/268438/alejandro-garnacho-ferreyra/240024/> (referer: https://sof
ifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/250961/joshua-zirkzee/240024/>
{'full_name': 'Joshua Orobosa Zirkzee', 'overall_rating': '75'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/271575/simone-pafundi/240024/>
{'full_name': 'Simone Pafundi', 'overall_rating': '67'}
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/253072/darwin-nunez/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/237681/takefusa-kubo/240024/>
{'full_name': '久保 建英', 'overall_rating': '81'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/229391/joao-maria-palhinha-goncalves/240024/>
{'full_name': 'João Maria Lobo Alves Palhinha Gonçalves', 'overall_rating': '84'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/272978/jorrel-hato/240024/>
{'full_name': 'Jorrel Hato', 'overall_rating': '73'}
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/272834/joao-pedro-goncalves-neves/240024/> (referer: https://sofi
fa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/165153/karim-benzema/240024/>
{'full_name': 'Karim Benzema', 'overall_rating': '90'}
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/276589/vitor-hugo-roque-ferreira/240024/> (referer: https://sofif
a.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/271574/rico-lewis/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/264298/conor-bradley/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/270673/warren-zaire-emery/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/271916/bryan-zaragoza-martinez/240024/> (referer: https://sofifa.
com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/268438/alejandro-garnacho-ferreyra/240024/>
{'full_name': 'Alejandro Garnacho Ferreyra', 'overall_rating': '75'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/253072/darwin-nunez/240024/>
{'full_name': 'Darwin Gabriel Núñez Ribeiro', 'overall_rating': '82'}
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/235790/kai-havertz/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/272834/joao-pedro-goncalves-neves/240024/>
{'full_name': 'João Pedro Gonçalves Neves', 'overall_rating': '73'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/276589/vitor-hugo-roque-ferreira/240024/>
{'full_name': 'Vitor Hugo Roque Ferreira', 'overall_rating': '76'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/271574/rico-lewis/240024/>
{'full_name': 'Rico Lewis', 'overall_rating': '75'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/264298/conor-bradley/240024/>
{'full_name': 'Conor Bradley', 'overall_rating': '69'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/270673/warren-zaire-emery/240024/>
{'full_name': 'Warren Zaïre-Emery', 'overall_rating': '79'}
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/270964/jobe-bellingham/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/271916/bryan-zaragoza-martinez/240024/>
{'full_name': 'Bryan Zaragoza Martínez', 'overall_rating': '73'}
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/212228/ivan-toney/240023/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/252371/jude-bellingham/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/235790/kai-havertz/240024/>
{'full_name': 'Kai Lukas Havertz', 'overall_rating': '82'}
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/272926/lucas-bergvall/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/259399/rasmus-hojlund/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/269136/kobbie-mainoo/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/270964/jobe-bellingham/240024/>
{'full_name': 'Jobe Bellingham', 'overall_rating': '66'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/212228/ivan-toney/240023/>
{'full_name': 'Ivan Toney', 'overall_rating': '80'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/252371/jude-bellingham/240024/>
{'full_name': 'Jude Victor William Bellingham', 'overall_rating': '87'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/272926/lucas-bergvall/240024/>
{'full_name': 'Lucas Bergvall', 'overall_rating': '64'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/259399/rasmus-hojlund/240024/>
{'full_name': 'Rasmus Winther Højlund', 'overall_rating': '77'}
2024-02-01 18:51:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/269136/kobbie-mainoo/240024/>
{'full_name': 'Kobbie Mainoo', 'overall_rating': '67'}
2024-02-01 18:51:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/263370/valentin-barco/240024/> (referer: https://sofifa.com)
2024-02-01 18:51:28 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sofifa.com/player/263370/valentin-barco/240024/>
{'full_name': 'Valentín Barco', 'overall_rating': '73'}
2024-02-01 18:51:28 [scrapy.core.engine] INFO: Closing spider (finished)
2024-02-01 18:51:28 [scrapy.extensions.feedexport] INFO: Stored json feed (60 items) in: players.json
2024-02-01 18:51:28 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 25570,
'downloader/request_count': 61,
'downloader/request_method_count/GET': 61,
'downloader/response_bytes': 916772,
'downloader/response_count': 61,
'downloader/response_status_count/200': 61,
'elapsed_time_seconds': 2.138765,
'feedexport/success_count/FileFeedStorage': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2024, 2, 2, 2, 51, 28, 119043, tzinfo=datetime.timezone.utc),
'httpcompression/response_bytes': 3787828,
'httpcompression/response_count': 61,
'item_scraped_count': 60,
'log_count/DEBUG': 122,
'log_count/INFO': 11,
'request_depth_max': 1,
'response_received_count': 61,
'scheduler/dequeued': 61,
'scheduler/dequeued/memory': 61,
'scheduler/enqueued': 61,
'scheduler/enqueued/memory': 61,
'start_time': datetime.datetime(2024, 2, 2, 2, 51, 25, 980278, tzinfo=datetime.timezone.utc)}
2024-02-01 18:51:28 [scrapy.core.engine] INFO: Spider closed (finished)
Answered By - Alexander
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.