Issue
I am currently trying to retrieve the associated match links which are hrefs from this page.I cannot seem to find them straight off the bat using selenium/soup. I understand they might be from a api but I cant figure out how to find them under the section class of mls-l-module mls-l-module--match-list
import requests
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException
from selenium.webdriver import ActionChains
from selenium.webdriver.common.by import By
from time import sleep, time
import pandas as pd
import warnings
import numpy as np
from datetime import datetime
import json
from bs4 import BeautifulSoup
warnings.filterwarnings('ignore')
base_url = 'https://www.mlssoccer.com/schedule/scores#competition=mls-regular-season&club=all&date=2023-02-20'
# create an empty list to store urls.
urls = []
option = Options()
option.headless = False
driver = webdriver.Chrome("##########",options=option)
driver.get(base_url)
# click the cookie pop up
WebDriverWait(driver, 15).until(EC.element_to_be_clickable((By.XPATH, '/html/body/div[3]/div[2]/div/div[1]/div/div[2]/div/button[2]'))).click()
the output is expected to be a list of urls from this page, where I will loop to the next page and collect all href links for matches.Perhaps using selenium to render the page for soup is a better option
Solution
As stated in the comments you can bypass selenium
altogether and use their Ajax API directly:
import requests
params = {
"culture": "en-us",
"dateFrom": "2023-02-19",
"dateTo": "2023-02-27",
"competition": "98",
"matchType": "Regular",
"excludeSecondaryTeams": "true",
}
api_url = 'https://sportapi.mlssoccer.com/api/matches'
base_url = 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/'
data = requests.get(api_url, params=params).json()
for m in data:
h, a = m['home']['fullName'], m['away']['fullName']
print(f'{h:<30} {a:<30} {base_url + m["slug"]}/')
Prints:
Nashville SC New York City Football Club https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/nshvsnyc-02-25-2023/
Atlanta United San Jose Earthquakes https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/atlvssj-02-25-2023/
Charlotte FC New England Revolution https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/cltvsne-02-25-2023/
FC Cincinnati Houston Dynamo FC https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/cinvshou-02-25-2023/
D.C. United Toronto FC https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/dcvstor-02-25-2023/
Inter Miami CF CF Montréal https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/miavsmtl-02-25-2023/
Orlando City New York Red Bulls https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/orlvsrbny-02-25-2023/
Philadelphia Union Columbus Crew https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/phivsclb-02-25-2023/
Austin FC St. Louis CITY SC https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/atxvsstl-02-25-2023/
FC Dallas Minnesota United https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/dalvsmin-02-25-2023/
Vancouver Whitecaps FC Real Salt Lake https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/vanvsrsl-02-25-2023/
Seattle Sounders FC Colorado Rapids https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/seavscol-02-26-2023/
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.