Issue
I'm trying to scrape this HTML title
<h2 id="p89" data-pid="89"><span id="page77" class="pageNum" data-no="77" data-before-text="77"></span>Tuesday, July 30</h2>
from this website: https://wol.jw.org/en/wol/h/r1/lp-e
My code:
from bs4 import BeautifulSoup
import requests
url = requests.get('https://wol.jw.org/en/wol/h/r1/lp-e').text
soup = BeautifulSoup(url, 'lxml')
textodiario = soup.find('header')
dia = textodiario.h2.text
print(dia)
It should returns me today's day but it returns me a passed day: Wednesday, July 24
Solution
At the moment I don't have a PC to test, please double check for possible errors.
You need the chromedriver for your platform too, put it in the same folder of the script.
My idea would be to use selenium to get the HTML and then parse it:
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
url = "https://wol.jw.org/en/wol/h/r1/lp-e"
options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu')
driver = webdriver.Chrome(chrome_options=options)
driver.get(url)
time.sleep(3)
page = driver.page_source
driver.quit()
soup = BeautifulSoup(page, 'html.parser')
textodiario = soup.find('header')
dia = textodiario.h2.text
print(dia)
Answered By - Pitto
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.