Issue
I need your help trying to automate this web page by getting the data of all the players on the different pages.
import request
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.mlb.com/es/stats/spring-training'
pagina = requests.get(url2)
soup = BeautifulSoup(pagina.text, 'lxml')
table = soup.find('table', {'class':"bui-table is-desktop-sKqjv9Sb"})
encabezados = []
for i in table.find_all('th')[:18]:
datos = i.find_all('button')
for td in datos:
titulo = td.text.strip()
encabezados.append(titulo)
datos_mlb = pd.DataFrame(columns = encabezados)
nombres = []
for i in table.find_all('th')[18:]:
datos = i.find_all('a')
for td in datos:
jugadores = td.text.strip()
nombres.append(jugadores)
datos_mlb['JUGADOR'] = nombres
for fila in table.find_all('tr')[1:]:
data = fila.find_all('td')
data_fila = [td.text.strip() for td in data]
largo = len(datos_mlb)-1
datos_mlb.iloc[:,1:] = data_fila
I have tried to fit the vast majority of information, however I cannot complete the data correctly and iterate all the pages.
Solution
Try to use the structured data from JSON response of XHR request to create your dataframe
. Inspect network tab in your browsers devtools, to get an idea, what parameters you should send and what you will get:
import pandas as pd
import requests
data = []
for i in range(0,175,25):
data.extend(
requests.get(
f'https://bdfed.stitch.mlbinfra.com/bdfed/stats/player?stitch_env=prod&season=2022&sportId=1&stats=season&group=hitting&gameType=S&limit=25&offset={i}&sortStat=onBasePlusSlugging&order=desc',
headers = {'user-agent': 'Mozilla/5.0'}
).json()['stats']
)
pd.DataFrame(data)
Output
playerId | playerName | ... | type | atBatsPerHomeRun | |||
---|---|---|---|---|---|---|---|
0 | 502671 | Paul Goldschmidt | ... | player | 5.5 | ||
1 | 621439 | Byron Buxton | ... | player | 6.4 | ||
2 | 547180 | Bryce Harper | ... | player | 4.38 | ||
3 | 658668 | Edward Olivares | ... | player | 11.33 | ||
4 | 670351 | Jose Rojas | ... | player | 9 | ||
... | ... | ... | |||||
156 | 593871 | Jorge Polanco | ... | player | 32.00 | ||
157 | 676475 | Alec Burleson | ... | player | -.-- | ||
158 | 608385 | Jesse Winker | ... | player | -.-- | ||
159 | 641355 | Cody Bellinger | ... | player | -.-- | ||
160 | 660162 | Yoan Moncada | ... | player | -.-- |
[161 rows x 72 columns]
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.