Issue
So I've been using Selenium in Chrome to go to a social media profile and scrape the usernames of its followers. However, the list is in the 100s of thousands and the page only loads a limited amount. My solution was to tell Selenium to scroll down endlessly and scrape usernames using 'driver.find_elements' as it goes, but after a few hundred usernames Chrome soon crashes with the error code "Ran out of memory".
Am I even capable of getting that entire list?
Is Selenium even the right tool to use or should I use Scrapy? Maybe both?
I'm at a loss on how to move forward from here.
Here's my code just in case
from easygui import *
import time
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service as ChromeService;
from webdriver_manager.chrome import ChromeDriverManager;
choice = ccbox("Run the test?","",("Run it","I'm not ready yet"));
if choice == False:
quit()
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()));
time.sleep(60) #this is a wait to give me time to manually log in and go
#to followers list
SCROLL_PAUSE_TIME = 0.5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
driver.execute_script("window.scrollTo(0, 1080);")
time.sleep(1)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
last_height = new_height
Solution
I figured it out! So every "follower" had an element and endlessly scrolling would store all of these elements in memory until it hit a limit. I solved this by deleting the elements with javascript after scrolling a certain amount, rinse and repeat until reaching the bottom :)
Answered By - javery1337
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.