Issue
I am trying to parse song titles from a website, but can't figure out how to grab the specific div that has them. I've tried about a dozen different methods but always get back an empty list.
If you go to the url and inspect one of the youtube videos there, you will find a div with a class of single-post-oembed-youtube-wrapper
. That element also contains the artist and title of the song.
This is my first time attempting to scrape data from a webpage, can someone help me out?
import json
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
import pprint
from webdriver_manager.chrome import ChromeDriverManager
import sys
html = None
url = 'https://ultimateclassicrock.com/best-rock-songs-2018/'
browser = webdriver.Chrome(executable_path="/usr/bin/chromedriver")
browser.get(url)
soup = BeautifulSoup(browser.page_source, 'html.parser')
divs = soup.find_all("div", {"class":"single-post-oembed-youtube-wrapper'"})
#all_songs = browser.find_elements(By.CLASS_NAME, 'single-post-oembed-youtube-wrapper')
#html = all_songs.get_attribute("outerHTML")
pprint.pprint(divs)
browser.close()
Solution
You can also try to retrieve the data directly from the HTML source, thus avoiding Selenium.
import requests
from bs4 import BeautifulSoup
import pandas
url = "https://ultimateclassicrock.com/best-rock-songs-2018/"
res = requests.get(url)
soup = BeautifulSoup(res.content)
results = []
for elem in soup.find_all("strong"):
if "," in elem.text:
results.append(elem.text.split(", "))
df = pd.DataFrame(results, columns=["artist", "song"])
df
Output:
artist song
0 Steve Perry 'Sun Shines Gray'
1 Paul McCartney 'I Don't Know'
2 Judas Priest 'Flamethrower'
3 Ace Frehley 'Rocking With the Boys'
4 Paul Simon 'Questions for the Angels'
...
This is slightly hacky but works with your example.
Answered By - petezurich
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.