Issue
I'm new to scrapy, and i'm trying to retrieve my favourite team's score in a json file. However, my json file stays empty.
Here's my code :
import scrapy
from scrapy.crawler import CrawlerProcess
class SoccerwaySpider(scrapy.Spider):
name="Soccerway"
start_urls = ['https://fr.soccerway.com/teams/france/olympique-de-marseille/890/']
def start_requests(self):
headers= {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'}
for url in self.start_urls:
yield scrapy.Request(url, headers=headers, callback=self.parse)
def parse(self,response):
yield
{
'score':str.strip(response.css("table.matches").css('td.score-time.score').css('a::text').get()),
}
process = CrawlerProcess(settings={
"FEEDS": {
"Soccerway.json": {"format": "json"},
},
})
process.crawl(SoccerwaySpider)
process.start()
Thank you in advance!
Solution
You can do that using pandas. Here is the working solution.
import requests
import pandas as pd
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'}
url = "https://fr.soccerway.com/teams/france/olympique-de-marseille/890/"
req = requests.get(url,headers=headers)
wiki_table = pd.read_html(req.text, attrs = {"class":"matches"} )
df = wiki_table[0].to_csv('score.csv', index = False)
#print(df)
Output:
Date Compétition Résultats Equipe visitée Score/Temps Equipe visiteuse
15/08/21 LI1 Marseille 2 - 2 Bordeaux Afficher les événements
28/08/21 LI1 Marseille 3 - 1 Saint-Étienne Afficher les événements
11/09/21 LI1 AS Monaco 0 - 2 Marseille Afficher les événements
16/09/21 LIE Lokomotiv Moscou 1 - 1 Marseille Afficher les événements
19/09/21 LI1 Marseille 2 - 0 Stade Rennais Afficher les événements
22/09/21 LI1 Angers 21 : 00 Marseille
26/09/21 LI1 Marseille 20 : 45 Lens
30/09/21 LIE Marseille 21 : 00 Galatasaray SK
03/10/21 LI1 LOSC Lille 17 : 00 Marseille
17/10/21 LI1 Marseille 20 : 45 Lorient
Answered By - Fazlul
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.