Issue
Quick introduction: MageAI is an ETL pipeline tool that is similar to Airflow. I'm using MageAI to run cronjobs daily to crawl websites using Selenium. As of now, MageAI offers installation via docker-compose, kubernetes, and via pip install. When I install it via docker-compose or kubernetes, Selenium driver will not start. But when I use pip install, selenium works.
Scenario of how it is working via pip install:
I'm using AWS Lightsail to run my ubuntu instance. Within this server, I did pip3 install mage-ai selenium
. This works and the selenium driver was able to start. The problem of this approach is that, since it is running on my server as a python module, it can be unstable as the server may go down.
Ideal scenario: If I were to implement this via docker-compose or even kubernetes, it will become even more stable and scalable. But all the approaches I've done so far will result in the same error which is selenium chrome driver failed to start.
What I've done so far:
- I tried extending from the MageAI docker image and installed all the necessary dependencies to run Selenium and Mage, it gave me the same error.
- I tried extending from a ubuntu docker image, installed MageAI and Selenium via
pip3 install
, and it still didn't work.
Anyone would know how to successfully run selenium via docker-compose or even kubernetes?
Solution
version: '3'
services:
magic:
image: mageai/mageai:latest
<mage-ai stuff here>
selenium:
image: selenium/standalone-chrome:latest
environment:
- SE_NODE_OVERRIDE_MAX_SESSIONS=true
- SE_NODE_MAX_SESSIONS=10
- SE_NODE_GRID_URL=https://0.0.0.0:4444
ports:
- 4444:4444
Use the selenium docker image and have it as a service.
Call within Python like so:
selenium_server_url = "http://<ip>:4444/wd/hub"
driver = webdriver.Remote(command_executor=selenium_server_url, options=chrome_options)
Make sure to add this after crawling to free up instances:
driver.quit()
Answered By - Kamarul Adha
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.