Issue
I need to pass a login URL from one class to spider Class and perform web scraping on it.
import quotes as q
import scrapy
from scrapy.crawler import CrawlerProcess
class ValidateURL:
def checkURL(self,urls):
try:
if(urls):
for key, value in urls.items():
if value['login_details']:
self.runScrap(value)
except:
return False
def runScrap(self,data):
if data:
process = CrawlerProcess()
# here I'm passing a URL (mail.google.com)
process.crawl(q.QuotesSpider, passed_url=data['url'])
process.start()
# -*- coding: utf-8 -*-
from scrapy import Spider
from scrapy.http import FormRequest
from scrapy.utils.response import open_in_browser
import sys
import logging
from bs4 import BeautifulSoup
# import scrapy
# from scrapy.crawler import CrawlerProcess
logging.basicConfig(filename='app.log',level=logging.INFO)
class QuotesSpider(Spider):
name = 'quotes'
# I need to update this with passed variable
start_urls = ('https://quotes.toscrape.com/login',)
def parse(self, response):
pass
def scrape_pages(self, response):
pass
My code is self-explanatory and needs to update the superclass variable with passed parameter. how can I implement this? I tried using self.passed_url
but is accessible only inside the function and not getting an update.
Solution
You need to match the passed argument name with the spider start_urls
attribute.
According to the docs, if you don't override the __init__
method of the spider, all the passed arguments to the spider class are mapped to the spider attributes. So in order to override the start_urls
attribute, you need to send the extact argument name.
Something like this:
def runScrap(self,data):
if data:
process = CrawlerProcess()
process.crawl(q.QuotesSpider, start_urls=[data['url']])
process.start()
Hope it helps.
Answered By - asimhashmi
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.