Issue
I'm currently struggling to find a way to get access to the DupeFilter
object from within my Spider
.
If I could access it then I could just add another fingerprint to the fingerprints set.
Solution
So, it looks like you have to dig pretty deep to get to the DupeFilter
: self.crawler.engine.slot.scheduler.df
So adding a fingerprint would look like this:
def parse_page(self, response):
# ...
dupe_filter = self.crawler.engine.slot.scheduler.df
dummy_request = Request('http://example.com/thing/9964')
fingerprint = dupe_filter.request_fingerprint(dummy_request)
dupe_filter.fingerprints.add(fingerprint)
# ...
Answered By - Acorn
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.