Issue
I am working on this project on Python 3.8. I have to download data into a Pandas Dataframe and ultimately write to a databse (SQL or Access) for all premier league teams for 2018 & 2019. I am trying to use beautifulsoup for that. I have a code that works with soccerbase.com but it does not work on sofascore.com @oppressionslayer has helped with the code so far. Can anybody please help me?
import json
import pandas as pd
import requests
from bs4 import BeautifulSoup as bs
url = "https://www.sofascore.com/football///json"
r = requests.get(url)
soup = bs(r.content, 'lxml')
json_object = json.loads(r.content)
json_object['sportItem']['tournaments'][0]['events'][0]['homeTeam']['name']
# 'Sheffield United'
json_object['sportItem']['tournaments'][0]['events'][0]['awayTeam']['name'] # 'Manchester United'
json_object['sportItem']['tournaments'][0]['events'][0]['homeScore']['current']
# 3
json_object['sportItem']['tournaments'][0]['events'][0]['awayScore']['current']
print(json_object)
How do I loop this code to get the entire universe of teams? My aim is to get every team data with rows as ["Event date", "Competition", "Home Team", "Home Score", "Away Team", "Away Score", "Score"] e.g. 31/10/2019 Premier League Chelsea 1 Manchester United 2 1-2
I am a sarter and how can I get it?
Solution
This code just works. Although it does not capture all database of the website but ith is a potent scraper
import simplejson as json
import pandas as pd
import requests
from bs4 import BeautifulSoup as bs
url = "https://www.sofascore.com/football///json"
r = requests.get(url)
soup = bs(r.content, 'lxml')
json_object = json.loads(r.content)
headers = ['Tournament', 'Home Team', 'Home Score', 'Away Team', 'Away Score', 'Status', 'Start Date']
consolidated = []
for tournament in json_object['sportItem']['tournaments']:
rows = []
for event in tournament["events"]:
row = []
row.append(tournament["tournament"]["name"])
row.append(event["homeTeam"]["name"])
if "current" in event["homeScore"].keys():
row.append(event["homeScore"]["current"])
else:
row.append(-1)
row.append(event["awayTeam"]["name"])
if "current" in event["awayScore"].keys():
row.append(event["awayScore"]["current"])
else:
row.append(-1)
row.append(event["status"]["type"])
row.append(event["formatedStartDate"])
rows.append(row)
df = pd.DataFrame(rows, columns=headers)
consolidated.append(df)
pd.concat(consolidated).to_csv(r'Path.csv', sep=',', encoding='utf-8-sig',
index=False)
Courtesy Praful Surve @praful-surve
Answered By - user12426867
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.