Issue
I'm starting to work with item loaders in scrapy,and the basic functionality is working fine as in:
l.add_xpath('course_title', '//*[@class="course-header-ng__main-info__name__title"]//text()')
But if I want to apply a funtion to this item, where do I define the function?
On this question there is an example:
from scrapy.loader.processors import Compose, MapCompose, Join, TakeFirst
clean_text = Compose(MapCompose(lambda v: v.strip()), Join())
to_int = Compose(TakeFirst(), int)
class MyItemLoader(ItemLoader):
default_item_class = MyItem
full_name_out = clean_text
bio_out = clean_text
age_out = to_int
weight_out = to_int
height_out = to_int
Does this goes instead of the custom template?:
import scrapy
class MoocsItem(scrapy.Item):
# define the fields for your item here like:
description = scrapy.Field()
course_title = scrapy.Field()
Can I use funtions that are one liners as?
clean_text = Compose(MapCompose(lambda v: v.strip()), Join())
Solution
There are two ways to use this.
Approach 1
You can change your Item
class like below
class MoocsItem(scrapy.Item):
# define the fields for your item here like:
description = scrapy.Field()
course_title = scrapy.Field(output_processor=clean_text)
And then you will use it like below
from scrapy.loader import ItemLoader
l = ItemLoader(item=MoocsItem(), response=response)
l.add_xpath('course_title', '//*[@class="course-header-ng__main-info__name__title"]//text()')
item = l.load_item()
This would of course be in a callback.
Approach 2
Another way to use it to create your own loader
class MoocsItemLoader(ItemLoader):
default_item_class = MoocsItem
course_title_name_out = clean_text
And then you will need to use loader in a callback like below
from scrapy.loader import ItemLoader
l = MoocsItemLoader(response=response)
l.add_xpath('course_title', '//*[@class="course-header-ng__main-info__name__title"]//text()')
item = l.load_item()
As you can see in this approach you don't need to pass it the created item
Answered By - Tarun Lalwani
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.