Issue
I am requesting adress information for a webservice to crosscheck whether the adres that I already have is in the same format as the webservice I am requesting from.
For this I have the following item with the following input_processor:
class AdresItem(scrapy.Item):
postal_code = scrapy.Field()
house_number = scrapy.Field()
addition = scrapy.Field()
scraped_addition = scrapy.Field(
input_processor = MapCompose(MyFunction),
output_processor = TakeFirst()
)
def MyFunction(scraped_addition):
if scraped_addition == addition
return scraped_addition
else:
return None
ofcourse I can't access the original addition this way. What would be a good way to go about using another variable of the item in the input processor?
Solution
Set the variable through item context and load the variable in the function.
Example:
import scrapy
from scrapy.loader import ItemLoader
from scrapy.loader.processors import MapCompose
def MyFunction(scraped_addition, loader_context):
addition = loader_context.get('addition')
if scraped_addition == addition:
return scraped_addition
else:
return None
class ExampleItem(scrapy.Item):
scraped_addition = scrapy.Field(input_processor=MapCompose(MyFunction))
class ExampleSpider(scrapy.Spider):
name = 'exampleSpider'
start_urls = ['https://scrapingclub.com/exercise/detail_basic/']
def parse(self, response):
l = ItemLoader(item=ExampleItem(), response=response)
l.context['addition'] = 'Long-sleeved Jersey Top'
l.add_xpath('scraped_addition', '//h3/text()')
yield l.load_item()
Answered By - SuperUser
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.