Issue
I'm very new to python and BeautifulSoup. I wrote the code below to try to call up the website (https://www.fangraphs.com/depthcharts.aspx?position=Team), scrape the data in the table and export it to a csv file. I was able to write code to extract data from other tables on the website, but not this particular one. It keeps coming back with: AttributeError: NoneType' object has no attribute 'find'. I've been racking my brain trying to figure out what I'm doing wrong. Do I have the wrong "class" name? Again, I've very new and trying to teach myself. I have been learning via trial and error and reverse engineering other's codes. This one has me stumped. Any guidance?
import requests
import csv
import datetime
from bs4 import BeautifulSoup
# static urls
season = datetime.datetime.now().year
URL = "https://www.fangraphs.com/depthcharts.aspx?position=Team".format(season=season)
# request the data
batting_html = requests.get(URL).text
def parse_array_from_fangraphs_html(input_html, out_file_name):
"""
Take a HTML stats page from fangraphs and parse it out to a CSV file.
"""
# parse input
soup = BeautifulSoup(input_html, "lxml")
table = soup.find("table", {"class": "tablesoreder, depth_chart tablesorter tablesorter-default"})
# get headers
headers_html = table.find("thead").find_all("th")
headers = []
for header in headers_html:
headers.append(header.text)
print(headers)
# get rows
rows = []
rows_html = table.find("tbody").find_all("tr")
for row in rows_html:
row_data = []
for cell in row.find_all("td"):
row_data.append(cell.text)
rows.append(row_data)
# write to CSV file
with open(out_file_name, "w") as out_file:
writer = csv.writer(out_file)
writer.writerow(headers)
writer.writerows(rows)
parse_array_from_fangraphs_html(batting_html, 'Team War Totals.csv')
Solution
The traceback looks like
AttributeError Traceback (most recent call last)
<ipython-input-4-ee944e08f675> in <module>()
41 writer.writerows(rows)
42
---> 43 parse_array_from_fangraphs_html(batting_html, 'Team War Totals.csv')
<ipython-input-4-ee944e08f675> in parse_array_from_fangraphs_html(input_html, out_file_name)
20
21 # get headers
---> 22 headers_html = table.find("thead").find_all("th")
23 headers = []
24 for header in headers_html:
AttributeError: 'NoneType' object has no attribute 'find'
So yes, the problem is in the
table = soup.find("table", {"class": "tablesoreder, depth_chart tablesorter tablesorter-default"})
Instruction.
You could modify it in order to split the class attribute upon the spaces, as suggested by another user. But then you would be getting another failure because the parsed table has no tbody.
The fixed script would look like
import requests
import csv
import datetime
from bs4 import BeautifulSoup
# static urls
season = datetime.datetime.now().year
URL = "https://www.fangraphs.com/depthcharts.aspx?position=Team".format(season=season)
# request the data
batting_html = requests.get(URL).text
def parse_array_from_fangraphs_html(input_html, out_file_name):
"""
Take a HTML stats page from fangraphs and parse it out to a CSV file.
"""
# parse input
soup = BeautifulSoup(input_html, "lxml")
table = soup.find("table", class_=["tablesoreder,", "depth_chart", "tablesorter", "tablesorter-default"])
# get headers
headers_html = table.find("thead").find_all("th")
headers = []
for header in headers_html:
headers.append(header.text)
print(headers)
# get rows
rows = []
rows_html = table.find_all("tr")
for row in rows_html:
row_data = []
for cell in row.find_all("td"):
row_data.append(cell.text)
rows.append(row_data)
# write to CSV file
with open(out_file_name, "w") as out_file:
writer = csv.writer(out_file)
writer.writerow(headers)
writer.writerows(rows)
parse_array_from_fangraphs_html(batting_html, 'Team War Totals.csv')
Answered By - nilleb
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.