Issue
from selenium import webdriver
from selenium.webdriver.common.by import By
def search_letterboxd_by_genre(genre):
url = f"https://letterboxd.com/films/genre/{genre}/"
# Set up the Selenium webdriver
options = webdriver.ChromeOptions()
options.add_argument("--headless") # Run browser in headless mode (without GUI)
driver = webdriver.Chrome(options=options)
driver.get(url)
# Wait for the page to load
driver.implicitly_wait(5)
# Find all movie elements with the class name "film-poster"
movie_elements = driver.find_elements(By.CSS_SELECTOR, "frame")
if movie_elements:
for movie_element in movie_elements:
try:
movie_title = movie_element.find_element(By.TAG_NAME, "img").get_attribute("alt")
print(movie_title)
except:
print("Error extracting movie title.")
else:
print("No movies found for the given genre.")
# Close the browser
driver.quit()
This is the code (I have called the function and added an input statement, but just haven't copied it)
At first I was using beautiful soup but read somewhere that selenium is able to overcome the javascript so the movie titles will print but it is still not working.
Solution
Root cause: After the URL is launched, a Consent pop-up appears(see below). You need to get rid of that pop-up first. After that try to scrape.
Refer the working code below:
import time
from selenium.webdriver.common.by import By
from selenium import webdriver
url = f"https://letterboxd.com/films/genre/action/"
# Set up the Selenium webdriver
options = webdriver.ChromeOptions()
options.add_argument("--headless") # Run browser in headless mode (without GUI)
driver = webdriver.Chrome(options=options)
driver.get(url)
driver.maximize_window()
# Wait for the page to load
driver.implicitly_wait(10)
driver.find_element(By.XPATH, "//p[text()='Consent']").click()
time.sleep(10)
movie_elements = driver.find_elements(By.XPATH, "//div[@id='films-browser-list-container']//li//a//span[1]")
for movie in movie_elements:
print(movie.get_attribute("innerText"))
# Close the browser
driver.quit()
Console output:
Everything Everywhere All at Once (2022)
Spider-Man: Into the Spider-Verse (2018)
Inception (2010)
The Dark Knight (2008)
Spider-Man: No Way Home (2021)
Avengers: Infinity War (2018)
Spider-Man: Across the Spider-Verse (2023)
Avengers: Endgame (2019)
Baby Driver (2017)
Black Panther (2018)
Guardians of the Galaxy (2014)
Kill Bill: Vol. 1 (2003)
The Matrix (1999)
Scott Pilgrim vs. the World (2010)
Spider-Man: Homecoming (2017)
Avatar: The Way of Water (2022)
Thor: Ragnarok (2017)
The Lord of the Rings: The Fellowship of the Ring (2001)
Doctor Strange in the Multiverse of Madness (2022)
Star Wars (1977)
Top Gun: Maverick (2022)
Deadpool (2016)
Spider-Man: Far from Home (2019)
Mad Max: Fury Road (2015)
Dunkirk (2017)
The Avengers (2012)
Avatar (2009)
Guardians of the Galaxy Vol. 3 (2023)
Puss in Boots: The Last Wish (2022)
Guardians of the Galaxy Vol. 2 (2017)
The Empire Strikes Back (1980)
Tenet (2020)
Bullet Train (2022)
Doctor Strange (2016)
Iron Man (2008)
Captain America: Civil War (2016)
Spider-Man (2002)
Star Wars: The Force Awakens (2015)
The Incredibles (2004)
The Lord of the Rings: The Return of the King (2003)
Captain America: The Winter Soldier (2014)
The Dark Knight Rises (2012)
John Wick (2014)
Thor: Love and Thunder (2022)
The Lord of the Rings: The Two Towers (2002)
Spider-Man 2 (2004)
Batman Begins (2005)
Avengers: Age of Ultron (2015)
Star Wars: The Last Jedi (2017)
Black Widow (2021)
The Suicide Squad (2021)
Shang-Chi and the Legend of the Ten Rings (2021)
Captain Marvel (2019)
Return of the Jedi (1983)
Rogue One: A Star Wars Story (2016)
Ant-Man (2015)
Black Panther: Wakanda Forever (2022)
Logan (2017)
Star Wars: Episode III – Revenge of the Sith (2005)
Captain America: The First Avenger (2011)
Oldboy (2003)
The Northman (2022)
Léon: The Professional (1994)
Star Wars: The Rise of Skywalker (2019)
The Amazing Spider-Man (2012)
Kill Bill: Vol. 2 (2004)
Raiders of the Lost Ark (1981)
Star Wars: Episode I – The Phantom Menace (1999)
Deadpool 2 (2018)
Iron Man 3 (2013)
Eternals (2021)
Scarface (1983)
Process finished with exit code 0
Answered By - Shawn
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.