Issue
I'm writing my first script using BS4 as my intro to web scraping, and I'm having trouble. I'm following along with Automate The Boring Stuff with Python's tutorial where he uses soup.select('insert class here')
to select classes. When I run the code shown below, it tells me soup is not a proper command AttributeError: 'Response' object has no attribute 'select'
import webbrowser
import selenium
import bs4
import requests
table = []
url = 'http://espn.com/mlb/team/stats/_/name/wsh'
r = requests.get(url)
page = bs4.BeautifulSoup(r.text)
table = soup.select("Table2__th")
print(str(table))
Solution
I am assuming you actually want data in the table? That content is rendered using javascript so requests alone won't help if you target the table itself.
Better yet would be to grab from script tag then you get all the actual stats. Below I grab that info and put into a tidy dataframe for viewing.
import bs4
import requests
import re
import json
import pandas as pd
url = 'http://espn.com/mlb/team/stats/_/name/wsh'
r = requests.get(url)
page = bs4.BeautifulSoup(r.text, 'lxml')
r = re.compile(r'playerStats":(.*),"teamLeaders"' , re.DOTALL)
data = page.find('script', text=r).text
script = r.findall(data)[0]
players_info = json.loads(script)
player_batting_stats = players_info[0]
expanded_player_batting_stats = players_info[1]
table1 = []
table2 = []
headers = ['Name', 'GP', 'AB', 'R', 'H', '2B', '3B', 'HR', 'RBI', 'TB', 'BB', 'K', 'SB', 'BA', 'OBP', 'SLG', 'OPS', 'WAR']
for player in player_batting_stats:
name = player['athlete']['name']
row = [stat['value'] for stat in player['statGroups']['stats']]
row.insert(0, name)
table1.append(row)
df1 = pd.DataFrame(table1, columns = headers)
print(df1.head())
# repeat for table2 using expanded_player_batting_stats
Answered By - QHarr
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.