Issue
I created a scrapy spider watching some online video. It scrapes profile url from a website. I want to extend this to scrape data like address, name, phone,website url from each profile url scraped.
I was thinking to create to separate scrapers. One for scraping Profile url. And the second one to scrape the data from the scraped first url.
Is there any other solution?
Here is my spider that scrapes profile urls.
# -*- coding: utf-8 -*-
import scrapy
from ..items import ...scraperItem
class SpiderSpider(scrapy.Spider):
name = 'spider'
start_urls = ['https:// ...']
page_number = 15
def parse(self, response):
items=...scraperItem()
..._url=response.css('a.header-5.text-unbold ::attr(href)').extract_first()
items['..._url']= ..._url
yield items
next_page = 'https:/...'+str(...SpiderSpider.page_number)
if ...SpiderSpider.page_number <= 150:
...SpiderSpider.page_number += 15
yield response.follow(next_page, callback = self.parse)
Solution
You can add another parse method (eg. parse_profile
) to scrape the additional data. E.g.
def parse(self, response):
url = response.css('a.header-5.text-unbold ::attr(href)').extract_first()
yield response.follow(url, callback=self.parse_profile)
# next_page = ...
if self.page_number <= 150:
self.page_number += 15
yield response.follow(next_page, callback=self.parse)
def parse_profile(self, response)
item = HouzzscraperItem()
item['houzz_url'] = response.url
# item['address'] = ...
# item['name'] = ...
# item['phone'] = ...
yield item
Answered By - Thiago Curvelo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.