Issue
I am currently working on a script that alerts me if something changes on that webpage. I use bs4 and python selenium. Problem is that bs4 never actually gets the new page source.
driver.get("some url")
while (True):
html = driver.page_source
soup = bs4.BeautifulSoup(html, "html.parser")
......
driver.refresh()
Any ideas on that? I tried driver.find_element_by_tag_name("body")).text instead or entered the page again instead of refreshing it with driver.get(driver.current_url) but those didn't work out for me.
Solution
I don't know how BeautifulSoup
works but perhaps it doesn't wait for the page to fully load? For debugging purposes, I would try adding a sleep in the first line of the while and see if that changes things.
driver.get("some url")
while (True):
time.sleep(5)
html = driver.page_source
soup = bs4.BeautifulSoup(html, "html.parser")
......
driver.refresh()
The sleep was for debugging purposes, a better practice is to find an element on the page that you are sure is loaded last and then add a WebDriverWait for that element, e.g. a TABLE or IMAGE, etc.
Answered By - JeffC
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.