talan v0.1.0 Talan.Stream View Source
Link to this section Summary
Functions
Returns a probabilistically uniq stream.
Link to this section Functions
Link to this function
uniq(enum, bloom_filter)
View Sourceuniq(Enumerable.t(), Talan.BloomFilter.t()) :: Enumerable.t()
Returns a probabilistically uniq stream.
Its main advantage is that it doesn't store elements emitted by the stream. Instead it uses a bloom filter for membership check.
The stream never returns duplicate elements but it sometimes detects false positive duplicates depending on the bloom filter it uses. False positives are faulty duplicate detections that get rejected.
Examples
iex> list = ["a", "b", "c", "a", "b"]
iex> bloom_filter = Talan.BloomFilter.new(100_000, false_positive_probability: 0.001)
iex> Talan.Stream.uniq(list, bloom_filter) |> Enum.to_list
["a", "b", "c"]