Issue
I scrape daily lineups, and need to find out if a team does not have it's lineup posted. In this case, there is a class element called lineup__no. I'd like to look at each team and check if there lineup is posted, and if not, add that teams index to a list. For example, if there are 4 teams playing, and the first and third teams do not have a lineup posted, I want to return a list of [0,2]. I am guessing a list comprehension of some sort may help me get there, but struggling to come up with what I need. I tried a for loop for now to get each of the items under the main header. I've also tried adding each li item's text to a list and searching for "Unknown Lineup" but was unsuccessful.
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from bs4 import BeautifulSoup
import requests
import pandas as pd
#Scraping lineups for updates
url = 'https://www.rotowire.com/baseball/daily-lineups.php'
##Requests rotowire HTML
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
games = soup.select('.lineup.is-mlb')
for game in games:
initial_list = game.find_all('li')
print(initial_list)
Solution
Since I'm more familiar with Selenium I'll give you Selenium solution.
Please see my explanations inside the code given as comments.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
driver = webdriver.Chrome()
driver.maximize_window()
wait = WebDriverWait(driver, 20)
driver.get("https://www.rotowire.com/baseball/daily-lineups.php")
#wait for at least 1 game element to be visible
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".lineup.is-mlb")))
#add a short delay so that all the other games are loaded
time.sleep(0.5)
#get all the games blocks
games = driver.find_elements(By.CSS_SELECTOR,".lineup.is-mlb")
#iterate over the games elements with their indexes in a list comprehension
no_lineup = [j for idx, game in enumerate(games) for j in [idx*2, idx*2+1] if game.find_elements(By.XPATH, ".//li[@class='lineup__no']")]
#print the collected results
print(no_lineup)
#quit the driver
driver.quit()
Answered By - Prophet
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.