Issue
Im writing a script to scrape my website using scrapy but im currently running 8 different scripts which scrape each of my collections and save to its own CSV file. (collection 1 saves to collection-1.csv etc...)
Is there a way to run multiple spiders from one script and save the scraped data to each unique file?
The current script below.
import scrapy
from scrapy.crawler import CrawlerProcess
import csv
cs = open('results/collection-1-results.csv', 'w', newline="", encoding='utf-8')
header_names = ['stk','name','price','url']
csv_writer = csv.DictWriter(cs, fieldnames=header_names)
csv_writer.writeheader()
class XXX(scrapy.Spider):
name = 'XXX'
start_urls = [
'website-url.com'
]
def parse(self,response):
product_urls = response.css('div.grid-uniform a.product-grid- item::attr(href)').extract()
for product_url in product_urls:
yield
scrapy.Request(url='website-url.com'+product_url,callback=self.next_parse_two)
next_url = response.css('ul.pagination-custom li a[title="Next
»"]::attr(href)').get()
if next_url != None:
yield
scrapy.Request(url='website-url.com'+next_url,callback=self.parse)
def next_parse_two(self,response):
item = dict()
item['stk'] = response.css('script#swym-snippet::text').get().split('stk:')[1].split(',')[0]
item['name'] = response.css('h1.h2::text').get()
item['price'] =response.css('span#productPrice-product-template span.visually-hidden::text').get()
item['url'] = response.url
csv_writer.writerow(item)
cs.flush()
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(XXX)
process.start()
Solution
Yes, you can do that by injecting each spider class name by calling separate process.crawl() method from the same script such way that you have one spider and add more whatever you need as follows:
process.crawl(X)
process.crawl(Xx)
process.crawl(Xxx)
process.start()
Answered By - Fazlul
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.