Saturday, October 15, 2022

[FIXED] Run multiple Scrapes using Scrapy AND write to seperate csv files

October 15, 2022 python, scrapy No comments

Issue

Im writing a script to scrape my website using scrapy but im currently running 8 different scripts which scrape each of my collections and save to its own CSV file. (collection 1 saves to collection-1.csv etc...)

Is there a way to run multiple spiders from one script and save the scraped data to each unique file?

The current script below.

import scrapy
from scrapy.crawler import CrawlerProcess
import  csv

cs = open('results/collection-1-results.csv', 'w', newline="", encoding='utf-8')
header_names = ['stk','name','price','url']
csv_writer = csv.DictWriter(cs, fieldnames=header_names)
csv_writer.writeheader()

class XXX(scrapy.Spider):
    name = 'XXX'
    start_urls = [
    'website-url.com'
    ]



    def parse(self,response):
        product_urls  = response.css('div.grid-uniform a.product-grid- item::attr(href)').extract()

        for product_url in product_urls:
            yield 
scrapy.Request(url='website-url.com'+product_url,callback=self.next_parse_two)

        next_url  = response.css('ul.pagination-custom li a[title="Next 
»"]::attr(href)').get()
        if next_url != None:
            yield 
scrapy.Request(url='website-url.com'+next_url,callback=self.parse)


    def next_parse_two(self,response):
        item = dict()
        item['stk'] = response.css('script#swym-snippet::text').get().split('stk:')[1].split(',')[0]
        item['name'] = response.css('h1.h2::text').get()
        item['price'] =response.css('span#productPrice-product-template span.visually-hidden::text').get()
        item['url'] = response.url
        csv_writer.writerow(item)
        cs.flush()


process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(XXX)
process.start()

Solution

Yes, you can do that by injecting each spider class name by calling separate process.crawl() method from the same script such way that you have one spider and add more whatever you need as follows:

process.crawl(X)
process.crawl(Xx)
process.crawl(Xxx)
process.start()

Answered By - Fazlul

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, October 15, 2022

[FIXED] Run multiple Scrapes using Scrapy AND write to seperate csv files

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels