Issue
ItemLoader objects
classs crapy.loader.ItemLoader(item=None, selector=None, response=None, parent=None, **context)
user-friendly abstraction to populate an item with data by applying field processors to scraped data. When instantiated with a selector or a response it supports data extraction from web pages using selectors.Parameters item (scrapy.item.Item) – The item instance to populate using subsequent calls to add_xpath(), add_css(), or add_value().
selector (Selector object) – The selector to extract data from, when using the add_xpath(), add_css(), replace_xpath(), or replace_css() method.
response (Response object) – The response used to construct the selector using the default_selector_class, unless the selector argument is given, in which case this argument is ignored.
I have read the official documentation of scrapy, but I could not understand when I should use the selecoter property of the ItemLoader object.
I understand item and response, but isn't selector usually enough to use loader.add_xpath etc. under the defined parse method?
Solution
It's like it says in the description, the selector to extract the data from when using the add_css
, add_xpath
and other methods.
For example let's say the response is rather large and you would like to nest your selectors in order to narrow down the search field to extract your data. The selector field is where you would place the specific selector that your xpath
expression applies to.
Example
def parse(self, response):
for selector in response.xpath('....'):
itemloader = ItemLoader(item=MyItem, selector=selector, response=response)
itemloader.add_xpath(...)
In the example above the itemloader now knows not to extract the data with the xpath expression from the root, it will instead treat it as a relative xpath expression from the selector.
Answered By - Alexander
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.