Wednesday, May 11, 2022

[FIXED] Could this selenium code be recreated using scrapy?

May 11, 2022 python, python-3.x, scrapy No comments

Issue

I'm interested in getting a better idea of what scrapy can do. Here is a very simple selenium code that interacts with a website, fills in some boxes, clicks some elements and downloads a file. Could this code be replicated using scrapy?, so that a code is written using scrapy that does the exact same thing.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options=Options()
options.add_argument("--window-size=1920,1080")

driver=webdriver.Chrome(options=options)
   
driver.get("https://www.ons.gov.uk/")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("Education and childcare")
click_button=driver.find_element_by_xpath('//*[@id="nav-search-submit"]').click()
click_button=driver.find_element_by_xpath('//*[@id="results"]/div[1]/div[2]/div[1]/h3/a/span').click()
click_button=driver.find_element_by_xpath('//*[@id="main"]/div[2]/div[1]/section/div/div[1]/div/div[2]/h3/a/span').click()
click_button=driver.find_element_by_xpath('//*[@id="main"]/div[2]/div/div[1]/div[2]/p[2]/a').click()

Solution

"selenium code be recreated using scrapy" is also working fine with SeleniuRequest which is superfast than general selenium. You need scrapy project.It works as headless mode but always get screenshot for each step.

script:

import scrapy
from scrapy_selenium import SeleniumRequest
from selenium import webdriver

from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC




class TestSpider(scrapy.Spider):
    name = 'test'

    def start_requests(self):
        yield SeleniumRequest(
            url='https://www.ons.gov.uk',
            callback=self.parse,
            wait_time = 3,
            screenshot = True
        )

    def parse(self, response):
        driver = response.meta['driver']
        driver.save_screenshot('screenshot.png')

        WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("Education and childcare")
        driver.save_screenshot('screenshot_1.png')
        click_button=driver.find_element_by_xpath('//*[@id="nav-search-submit"]').click()
        driver.save_screenshot('screenshot_2.png')
        click_button=driver.find_element_by_xpath('//*[@id="results"]/div[1]/div[2]/div[1]/h3/a/span').click()
        click_button=driver.find_element_by_xpath('//*[@id="main"]/div[2]/div[1]/section/div/div[1]/div/div[2]/h3/a/span').click()
        click_button=driver.find_element_by_xpath('//*[@id="main"]/div[2]/div/div[1]/div[2]/p[2]/a').click()

Screenshot

settings.py file:

You have to add the following options in settings.py file

# Middleware

DOWNLOADER_MIDDLEWARES = {
    'scrapy_selenium.SeleniumMiddleware': 800
}


# Selenium
from shutil import which
SELENIUM_DRIVER_NAME = 'chrome'
SELENIUM_DRIVER_EXECUTABLE_PATH = which('chromedriver')
SELENIUM_DRIVER_ARGUMENTS = ['--headless']

SeleniumRequest

Output:

'downloader/response_status_count/200'

screenshot of the project looks like

How to download pdf using scrapy

screenshot

Answered By - F.Hoque

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, May 11, 2022

[FIXED] Could this selenium code be recreated using scrapy?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels