Issue
I am using scrapy to scrape a website I am in a loop where every item have link I want to go to following every time in a loop.
import scrapy
class MyDomainSpider(scrapy.Spider):
name = 'My_Domain'
allowed_domains = ['MyDomain.com']
start_urls = ['https://example.com']
def parse(self, response):
Colums = response.xpath('//*[@id="tab-5"]/ul/li')
for colom in Colums:
title = colom.xpath('//*[@class="lng_cont_name"]/text()').extract_first()
address = colom.xpath('//*[@class="adWidth cont_sw_addr"]/text()').extract_first()
con_address = address[9:-9]
url= colom.xpath('//*[@id="tab-5"]/ul/li/@data-href').extract_first()
print(url)
print('*********************')
yield scrapy.Request(url, callback = self.parse_dir_contents)
def parse_dir_contents(self, response):
print('000000000000000000')
a = response.xpath('//*[@class="fn"]/text()').extract_first()
print(a)
I have tried something like this but zeros print only once but stars prints 10 time I want it to run 2nd function to run every time when the loop runs.
Solution
You are probably doing something that is not intended. With
url = colom.xpath('//*[@id="tab-5"]/ul/li/@data-href').extract_first()
inside the loop, url
always results in the same value. By default, Scrapy filters duplicate requests (see here). If you really want to scrape the same URL multiple times, you can disable the filtering on request level with dont_filter=True
argument to scrapy.Request
constructor. However, I think that what you really want is to go like this (only the relevant part of the code left):
def parse(self, response):
Colums = response.xpath('//*[@id="tab-5"]/ul/li')
for colom in Colums:
url = colom.xpath('./@data-href').extract_first()
yield scrapy.Request(url, callback=self.parse_dir_contents)
Answered By - Tomáลก Linhart
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.