Issue
I'm trying to use selenium to get to the cached version of some websites. Unfortunaly, I'm not getting the same results as in Chrome.
When I go to chrome and enter:"cache:https://www.nytimes.com/", I'm immediately redirected to: "http://webcache.googleusercontent.com/search?q=cache%3Ahttps%3A%2F%2Fwww.nytimes.com%2F&rlz=1C5CHFA_enAR1034AR1034&oq=cache%3Ahttps%3A%2F%2Fwww.nytimes.com%2F&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIGCAEQRRg60gEIMzI2MWowajSoAgCwAgA&sourceid=chrome&ie=UTF-8"
This is exactly what I expect.
When I use driver.get("cache:https://www.nytimes.com/")
, I'm not getting redirected to that page at all.
Any clues what I might be missing? Many thanks!
EDIT:
I'm trying the following:
from selenium import webdriver
driver = webdriver.Chrome()
try:
link = driver.get("cache:https://www.nytimes.com/")
time.sleep(5)
print(driver.page_source)
except Exception as X:
print(X)
driver.close()
I'm getting:
<html><head></head><body></body></html>
Solution
Unfortunately, Selenium doesn't support cache:
call in url.
However, you can write your own query that reaches Google cache endpoint with url as param.
Current endpoint is ttps://webcache.googleusercontent.com/search
and cache param query is q=cache:{url}
def open_cached_url(driver, url):
driver.get(f"https://webcache.googleusercontent.com/search?q=cache:{url}")
open_cached_url(driver, "https://www.example.com")
Answered By - Yaroslavm
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.