Issue
I'm trying to scraping:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import pandas as pd
options = webdriver.FirefoxOptions()
options.binary_location = r'C://Mozilla Firefox/firefox.exe'
driver = selenium.webdriver.Firefox(executable_path='C://geckodriver.exe' , options=options)
url = 'https:/'
driver.get(url)
table = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.ID, 'dgtopHolders')))
dfs = pd.read_html(table.get_attribute('outerHTML'))
print(dfs[0])
How can I scrape this table ? thanks for your time
best
Solution
To scrape the data from the current holdings_table as the <table>
is present with the HTML DOM but not visible within the webpage, you need to induce WebDriverWait for the presence_of_element_located() for the <table>
element, extract the outerHTML, read the outerHTML using read_html()
and you can use the following locator strategy:
Code Block:
driver.execute("get", {'url': 'https://whalewisdom.com/filer/berkshire-hathaway-inc#tabholdings_tab_link'}) # data = driver.find_element(By.CSS_SELECTOR, "table#current_holdings_table").get_attribute("outerHTML") data = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CSS_SELECTOR, "table#current_holdings_table"))).get_attribute("outerHTML") df = pd.read_html(data) print(df)
Console Output:
[ Stock Unnamed: 1 Sector ... Source Source Date Date Reported 0 AAPL NaN INFORMATION TECHNOLOGY ... 13F 2022-03-31 2022-05-16 1 BAC NaN FINANCE ... 13F 2022-03-31 2022-05-16 2 AXP NaN FINANCE ... 13F 2022-03-31 2022-05-16 3 CVX NaN ENERGY ... 13F 2022-03-31 2022-05-16 4 KO NaN CONSUMER STAPLES ... 13F 2022-03-31 2022-05-16 5 OXY NaN ENERGY ... 4 2022-05-02 2022-05-04 6 KHC NaN CONSUMER STAPLES ... 13F 2022-03-31 2022-05-16 7 MCO NaN FINANCE ... 13F 2022-03-31 2022-05-16 8 USB NaN FINANCE ... 13F 2022-03-31 2022-05-16 9 ATVI NaN INFORMATION TECHNOLOGY ... 13F 2022-03-31 2022-05-16 10 HPQ NaN INFORMATION TECHNOLOGY ... 13G 2022-04-30 2022-04-30 11 BK NaN FINANCE ... 13F 2022-03-31 2022-05-16 12 KR NaN CONSUMER STAPLES ... 13F 2022-03-31 2022-05-16 13 DVA NaN HEALTH CARE ... 13D 2022-08-01 2022-08-01 14 C NaN FINANCE ... 13F 2022-03-31 2022-05-16 15 VRSN NaN COMMUNICATIONS ... 13F 2022-03-31 2022-05-16 16 GM NaN CONSUMER DISCRETIONARY ... 13F 2022-03-31 2022-05-16 17 PARA NaN COMMUNICATIONS ... 13F 2022-03-31 2022-05-16 18 CHTR NaN COMMUNICATIONS ... 13F 2022-03-31 2022-05-16 19 LSXMK NaN COMMUNICATIONS ... 13F 2022-03-31 2022-05-16 20 V NaN FINANCE ... 13F 2022-03-31 2022-05-16 21 AMZN NaN CONSUMER DISCRETIONARY ... 13F 2022-03-31 2022-05-16 22 AON NaN FINANCE ... 13F 2022-03-31 2022-05-16 23 MA NaN FINANCE ... 13F 2022-03-31 2022-05-16 24 SNOW NaN INFORMATION TECHNOLOGY ... 13F 2022-03-31 2022-05-16
Answered By - undetected Selenium
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.