Issue
I am trying to pull all the tables from this link: https://www.baseball-reference.com/awards/awards_2017.shtml
But I am only getting the first two tables, AL MVP Voting
& NL MVP Voting
but I'm not getting any of the tables after it, AL/NL Cy Young Voting, AL/NL Rookie of the Year Voting
, etc.
Heres the code I am using:
url = f'https://www.baseball-reference.com/awards/awards_2017.shtml'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html')
soup
table = soup.find_all('table')
table[2]
I tried the code and expected all tables to come up but I am only getting the first 2, and am getting None
for the third one and beyond.
Solution
The tables are embedded and hidden in comments, so simplest way to bring them up would be to uncomment them for example with .replace('<!--','').replace('-->','')
An alternative to be more specific is the use of bs4.Comment
Example
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(
requests.get('https://www.baseball-reference.com/awards/awards_2017.shtml').text.replace('<!--','').replace('-->','')
)
table = soup.find_all('table')
table[2]
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.