Crawly v0.3.0 Crawly.Pipelines.DuplicatesFilter View Source
Filters out duplicated items (helps to avoid storing duplicates)
This pipeline uses Crawly.DataStorageWorker process state in order to store ids of already seen items. For now they are stored only in memory.
The field responsible for identifying duplicates is specified using :crawly.item_id setting.