Tuesday, November 30, 2021

[FIXED] How to pass website address to SpiderClass from another python script

November 30, 2021 inheritance, python, python-class, scrapy, superclass No comments

Issue

I need to pass a login URL from one class to spider Class and perform web scraping on it.

import quotes as q
import scrapy
from scrapy.crawler import CrawlerProcess
class ValidateURL:

    def checkURL(self,urls):
        try:    
            if(urls):
                for key, value in urls.items():
                    if value['login_details']:
                        self.runScrap(value)                                      

        except:
            return False

    def runScrap(self,data):       
            if data:
               process = CrawlerProcess()
# here I'm passing a URL (mail.google.com)
               process.crawl(q.QuotesSpider, passed_url=data['url'])
               process.start()

# -*- coding: utf-8 -*-
from scrapy import Spider
from scrapy.http import FormRequest
from scrapy.utils.response import open_in_browser
import sys
import logging
from bs4 import BeautifulSoup
# import scrapy
# from scrapy.crawler import CrawlerProcess

logging.basicConfig(filename='app.log',level=logging.INFO)

class QuotesSpider(Spider):
    name = 'quotes'
    # I need to update this with passed variable
    start_urls = ('https://quotes.toscrape.com/login',)





    def parse(self, response):
        pass



    def scrape_pages(self, response):
      pass

My code is self-explanatory and needs to update the superclass variable with passed parameter. how can I implement this? I tried using self.passed_url but is accessible only inside the function and not getting an update.

Solution

You need to match the passed argument name with the spider start_urls attribute.

According to the docs, if you don't override the __init__ method of the spider, all the passed arguments to the spider class are mapped to the spider attributes. So in order to override the start_urls attribute, you need to send the extact argument name.

Something like this:

    def runScrap(self,data):       
        if data:
            process = CrawlerProcess()
            process.crawl(q.QuotesSpider, start_urls=[data['url']])
            process.start()

Hope it helps.

Answered By - asimhashmi

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, November 30, 2021

[FIXED] How to pass website address to SpiderClass from another python script

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels