Issue
I am trying to scrape tables from the following website:
https://www.rotowire.com/betting/mlb/player-props.php
Data for each table is within a script on the site starting with data: [{ ... }]
. This can be pulled using a combination of BeautifulSoup and regex. I cannot seem to convert this data into a Pandas DataFrame and it only reads it in as a single row. The data is read in as a list of dictionaries and looks as follows:
[{"gameID":"2513620","playerID":"13902","firstName":"Mark"},
{"gameID":"2512064","playerID":"12450","firstName":"Mike"},
{"gameID":"2513053","playerID":"14261","firstName":"Will"}]
This should work with pd.DataFrame(df)
, but it does not seem to read correctly when scraped from the site.
I have tried the following:
from bs4 import BeautifulSoup
import pandas as pd
import requests
import re
import json
url = 'https://www.rotowire.com/betting/mlb/player-props.php'
page = requests.get(url, verify=False)
soup = BeautifulSoup(page.text)
# Read first table
script = str(soup.findAll('script')[4])
data = re.findall(r'data: \[(.*?)\]', script)
df = pd.DataFrame(data)
0
0 {"gameID":"2513620","playerID":"13902","firstN...
Solution
Try:
from bs4 import BeautifulSoup
import pandas as pd
import requests
import re
import json
from requests.packages import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
url = 'https://www.rotowire.com/betting/mlb/player-props.php'
page = requests.get(url, verify=False)
soup = BeautifulSoup(page.text, 'html.parser')
# Read first table
script = str(soup.findAll('script')[4])
data = re.search(r'data: (\[.*?\])', script)
df = pd.DataFrame(json.loads(data.group(1)))
print(df.head())
Prints:
gameID playerID firstName lastName name team opp logo playerLink draftkings_onehit draftkings_twohit draftkings_onehomerun draftkings_onerbi draftkings_onesb draftkings_pitchwin fanduel_onehit fanduel_twohit fanduel_onehomerun fanduel_onerbi fanduel_onesb fanduel_pitchwin mgm_onehit mgm_twohit mgm_onehomerun mgm_onerbi mgm_onesb mgm_pitchwin pointsbet_onehit pointsbet_twohit pointsbet_onehomerun pointsbet_onerbi pointsbet_onesb pointsbet_pitchwin
0 2513620 13902 Mark Mathias Mark Mathias PIT HOU https://content.rotowire.com/images/teamlogo/baseball/100PIT.png?v=6 /betting/mlb/player/mark-mathias-odds-13902 115 1000 -115 -115 600 None None None None None None None None None None None None
1 2512064 12450 Mike Zunino Mike Zunino CLE NYY https://content.rotowire.com/images/teamlogo/baseball/100CLE.png?v=6 /betting/mlb/player/mike-zunino-odds-12450 115 700 -120 -105 600 None None None None None None None None None None None None
2 2513053 14261 Will Benson Will Benson CIN @ATL https://content.rotowire.com/images/teamlogo/baseball/100CIN.png?v=6 /betting/mlb/player/will-benson-odds-14261 110 900 -120 -140 410 None None None None None None None None None None None None
3 2513620 15016 Jason Delay Jason Delay PIT HOU https://content.rotowire.com/images/teamlogo/baseball/100PIT.png?v=6 /betting/mlb/player/jason-delay-odds-15016 110 1000 -115 -135 460 None None None None None None None None None None None None
4 2514026 15672 Geraldo Perdomo Geraldo Perdomo ARI MIL https://content.rotowire.com/images/teamlogo/baseball/100ARI.png?v=6 /betting/mlb/player/geraldo-perdomo-odds-15672 110 -135 -155 380 None None None None None None None None None None None None
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.