Issue
I am trying to get the lyrics of a song from the Genius website with the BeautifulSoap library. I've seen different approaches online but they all seem to be out-of-date, however, my code should work fine, and it seems to work fine, but it only retrieves some part of the lyrics div. Is my first time using this library so maybe I'm missing something. The lyrics are contained in the same
This is my code:
from bs4 import BeautifulSoup
import re
import requests
song_api_path = '/Taylor-swift-cardigan-lyrics'
page_url = "http://genius.com" + song_api_path
page = requests.get(page_url)
soup = BeautifulSoup(page.text, "html.parser")
div = soup.find("div",class_=lambda value: value and re.search(r'^Lyrics__Container', value))
all_text = div.get_text(separator='\n')
print(all_text)
Which produces this output:
[Verse 1]
Vintage tee, brand new phone
High heels on cobblestones
When you are young, they assume you know nothing
Sequin smile, black lipstick
Sensual politics
When you are young, they assume you know nothing
[Chorus]
But I knew you
Dancin' in your Levi's
Drunk under a streetlight, I
I knew you
Hand under my sweatshirt
Baby, kiss it better, I
[Refrain]
And when I felt like I was an old cardigan
Under someone's bed
You put me on and said I was your favorite
[Verse 2]
A friend to all is a friend to none
Chase two girls, lose the one
When you are young, they assume you know nothing
This result is ok but it is only half of the lyrics. I don't know why only this part is retrieved and not the rest of the text. I've checked the html in the Genius website but don't see anything different from the parts that are printed.
Any help is appreciated!
Solution
You used find
method.
Need to find all divs with Lyrics__Container
class
divs = soup.find_all("div",class_=lambda value: value and re.search(r'^Lyrics__Container', value))
all_text = '\n'.join([div.get_text(separator='\n') for div in divs])
print(all_text)
will print all lyrics
Answered By - Yuri R
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.