Issue
I am trying to scrape the following site:
https://cve.mitre.org/cve/data_feeds
driver = webdriver.Chrome() # brew install chromedirver
driver.get(self._SCRAPE_WEBSITE_URL)
page = driver.page_source
soup = BeautifulSoup(page, 'lxml')
cve = soup.find_all("li", {"class": "timeline-TweetList-tweet customisable-border"})
print(cve)
but my print returns an empty list.
any ideas?
Solution
The elements you are trying to access are inside an iframe. In order to access them you have to switch to that iframe. With Selenium this can be done as following:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome() # brew install chromedirver
wait = WebDriverWait(driver, 20)
driver.get(self._SCRAPE_WEBSITE_URL)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@id='twitter-widget-0']")))
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "li.timeline-TweetList-tweet.customisable-border")))
cve = driver.find_elements(By.CSS_SELECTOR, "li.timeline-TweetList-tweet.customisable-border")
I guess this can also be done with bs4, however I'm not familiar enough with bs4, so I don't know how to switch into iframe with bs4.
Also don't forget to switch to the default content when you finished dealing with iframe content.
Answered By - Prophet
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.