prolly v0.1.0 Prolly.BloomFilter

Use a Bloom filter when you want to keep track of whether you have seen a given value or not.

For example, the quesetion “have I seen the string foo so far in the stream?” is a reasonble question for a Bloom filter.

Specifically, a Bloom filter can tell you two things:

  1. When a value may be in a set.
  2. When a value is definitely not in a set

Carefully note that a Bloom filter can only tell you that a value might be in a set or that a value is definitely not in a set. It cannot tell you that a value is definitely in a set.

Link to this section Summary

Functions

Find the false positive rate for a given filter size, expected input size, and number of hash functions

Create a Bloom filter

Find the optimal number of hash functions for a given filter size and expected input size

Test if something might be in a bloom filter

Add a value to a bloom filter

Link to this section Types

Link to this type t()
t() :: Prolly.BloomFilter

Link to this section Functions

Link to this function false_positive_rate(filter_size, input_size, number_of_hashes)

Find the false positive rate for a given filter size, expected input size, and number of hash functions

Examples

iex> alias Prolly.BloomFilter
iex> BloomFilter.false_positive_rate(10000, 3000, 3) |> (fn(n) -> :erlang.round(n * 100) / 100 end).()
0.21
Link to this function new(filter_size, hashes)

Create a Bloom filter.

iex> alias Prolly.BloomFilter
iex> BloomFilter.new(20, [:md5, :sha, :sha256]).filter |> Enum.to_list
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

iex> alias Prolly.BloomFilter
iex> BloomFilter.new(20, [:md5, :sha, :sha256]).hashes
[:md5, :sha, :sha256]

iex> alias Prolly.BloomFilter
iex> BloomFilter.new(20, Enum.into([:md5, :sha, :sha256], MapSet.new)).hashes
#MapSet<[:md5, :sha, :sha256]>
Link to this function optimal_number_of_hashes(filter_size, input_size)

Find the optimal number of hash functions for a given filter size and expected input size

Examples

iex> alias Prolly.BloomFilter
iex> BloomFilter.optimal_number_of_hashes(10000, 1000) |> round
7
Link to this function possible_member?(bloom_filter, value)

Test if something might be in a bloom filter

Examples

iex> alias Prolly.BloomFilter
iex> bf = BloomFilter.new(20, [:md5, :sha, :sha256])
iex> bf = BloomFilter.update(bf, "hi")
iex> BloomFilter.possible_member?(bf, "hi")
true

iex> alias Prolly.BloomFilter
iex> bf = BloomFilter.new(20, [:md5, :sha, :sha256])
iex> bf = BloomFilter.update(bf, "hi")
iex> BloomFilter.possible_member?(bf, "this is not hi!")
false
Link to this function update(bloom_filter, value)

Add a value to a bloom filter

This operation runs in time proportional to the number of hash functions.

Examples

iex> alias Prolly.BloomFilter
iex> bf = BloomFilter.new(20, [:md5, :sha, :sha256])
iex> BloomFilter.update(bf, "hi").filter |> Enum.to_list
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]