Issue
I am trying to figure out how to embed the links from mlb.com for each player to the player inside of a list. So, when I click on the player's name then I will get directed to the mlb.com page for that player. For example, if I click on Yordan Alvarez it would take me to his stats since it would be embedded into Yordan Alvarez.
This is what I have tried so far, but I am currently stuck. How would I be able to embed the links inside of the players so that it works like this Yordan Alvarez?
from bs4 import BeautifulSoup
import requests
import re
# Request URL
url_1 = 'https://www.mlb.com/stats/'
req = requests.get(url_1).text
document = BeautifulSoup(req, 'html.parser')
# Body
tbody = document.tbody
# Headers
thead = document.thead
# Player Names
full_name = tbody.find_all('a')
# List of Players
players_list = []
for name in full_name:
if name.get('aria-label'):
names = name.get('aria-label')
players_list.append(names)
# List of Links
hrefs_list = []
hrefs = tbody.find_all('a',href = True)
# Players & Their Links
for link,player in zip(hrefs, players_list):
href_link = link['href']
if re.search('^/player', href_link):
stats_link = f'https://www.mlb.com{href_link}'
hrefs_list.append(stats_link)
hyperlink_format = f'<a href= {stats_link}>{player}</a>'
print(dict(zip(players_list, hrefs_list)))
Solution
You could use the fact that find_all
can use regexp on attributes.
Combining this with a dict comprehension would simplify this to:
from bs4 import BeautifulSoup
import requests
import re
base_url = 'https://www.mlb.com'
stats_url = f'{base_url}/stats/'
req = requests.get(stats_url).text
soup = BeautifulSoup(req, 'html.parser')
pattern = re.compile(r"/player/\d+")
links = soup.find_all('a', attrs={'href': pattern})
{a.text: f"{base_url}/{a.attrs.get('href')}" for a in links}
Answered By - dosas
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.