Issue
I'm trying to scrape a website, but I'm not getting the correct, analyzable code back.
I am using python 3.12 and the requests HTML module to scrape the websites. For some of them it works without problems, but for "https://www.ostseewelle.de/sendungen/H%C3%B6rercharts-id379456.html" it doesn't work, although I use the render function of Requests-HTML to execute javascript code on the website. From analyzing the website, I know that the information I am looking for is contained in a tag with the attribute data-label = "artist". But in the HTML contained by the scraping and rendering there is not a single tag...
I don't know what to do, can someone help me and point me in the right direction?
from requests_html import HTML, HTMLSession
charts = {'ODC50': {
'name': 'ODC50',
'anz': 50,
'url': 'https://www.mix1.de/charts/dance50.htm',
'entry': 'div.charts-main-block',
'date': '#mix1_content div.mybox_content'
},
'DDPHot50': {
'name': 'DDP Hot50',
'anz': 50,
'url': 'https://www.deutsche-dj-playlist.de/hot-50/dance',
'entry': 'div.list div.entry',
'date': 'div.header div.title'
},
'Ostseewelle': {
'name': 'Ostseewelle',
'anz': 20,
'url': 'https://www.ostseewelle.de/sendungen/H%C3%B6rercharts-id379456.html',
'entry': 'section',
'date': 'h3.text-center.titel1'
}
}
choice = 'Ostseewelle'
chart_site = charts.get(choice).get('url')
session = HTMLSession()
r = session.get(chart_site)
r.html.render(sleep=2, keep_page=True, scrolldown=5, timeout=30)
print(r.status_code)
html = r.html
#print(html.html)
tds = html.xpath('//td[@data-label="Künstler"]')
print(f'Gefundene Einträge: {len(tds)}')
print('Programm beendet')
I don't get the correct HTML code back to parse, the expected code is missing.
Solution
The chart data on the page you see is loaded from external URL. To get the info about artists you can use next example:
import requests
from bs4 import BeautifulSoup
url = "https://enricoostendorf.de/top20/top20eo.php"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for k in soup.select('[data-label="Künstler"]'):
l1, l2 = k.get_text(strip=True, separator="|||").split("|||")
print(l1)
print(l2)
print("-" * 80)
Prints:
...
--------------------------------------------------------------------------------
Loi
"Am I Enough"
--------------------------------------------------------------------------------
Nico Santos & Fast Boy
"Where You Are"
--------------------------------------------------------------------------------
Ofenbach
"Overdrive" (feat. Norma Jean Martine)
--------------------------------------------------------------------------------
Robin Schulz, Rita Ora, Tiago PZK
"I'll Be There"
--------------------------------------------------------------------------------
Tate McRae
"greedy"
--------------------------------------------------------------------------------
Dua Lipa
"Houdini"
--------------------------------------------------------------------------------
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.