Issue
I have a problem with parsing this page-https://airtable.com/appImi8PX0i84XFwj/shr1PWZhR25O6DJxK/tblJG95RoC1WrRppF/viwVAi9l6dxxBtihM. I need to get all data from the table, but i cant scroll it! I have already tried to scroll just with js in console but it doesnt work
p.s. i know this is similar question(Python script to scroll down non scroll-able page) but it doesnt help in my case
**this is my python code **
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import time
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.keys import Keys
def get_html(url):
service = Service(executable_path='C:\\airtable_parse\\chromedriver-win64\\chromedriver.exe')
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(service=service, options=options)
driver.maximize_window()
try:
driver.get(url=url)
time.sleep(10)
element = driver.find_element(By.XPATH, '//*[@id="table"]').send_keys(Keys.PAGE_DOWN) #Message: element not interactable
ActionChains(driver).move_to_element(element).perform()
#driver.execute_script("document.querySelector('#table').scrollTop = 500")
# table = driver.find_element(By.XPATH, '//div[@id="firstContainer"]')
# print(table)
# action = ActionChains(driver)
# action.move_to_element(table).perform()
# print(action)
# driver.execute_script('window.scrollTo(0, 300)', table)
# while True:
# # After your page is loaded
# page_hight = driver.get_window_size()['height'] # Get page height
# scroll_bar = driver.find_element(By.XPATH, "//div[contains(@class,'antiscroll-scrollbar-vertical')]")
# ActionChains(driver).drag_and_drop_by_offset(scroll_bar, 0, page_hight - 160).click().perform()
# #driver.execute_script('window.scrollBy(0, 100);')
except Exception as e:
print(e)
finally:
driver.close()
driver.quit()
def main():
get_html("https://airtable.com/embed/appImi8PX0i84XFwj/shr1PWZhR25O6DJxK/tblJG95RoC1WrRppF/viwVAi9l6dxxBtihM")
if __name__ == "__main__":
main()
there are a lot of attempts in my code. Also my js code -
document.querySelector("#view").scrollTop=300 document.querySelector("#viewContainer").scrollTop=300 document.querySelector("#table")
i think i need this html element ( that's not accurate)
Solution
Solution 1: intercept the response(about 7MB) of following request and decode it
Solution 2: simulate mouse wheel event(scroll/scrollBy/scrollTo does not work for this website)
I tried this method using puppeteer(page.mouse.wheel) and it can work.
Answered By - LetsScrapeData
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.