prolly v0.2.0 Prolly.BloomFilter

Use a Bloom filter when you want to keep track of whether you have seen a given value or not.

For example, the quesetion “have I seen the string foo so far in the stream?” is a reasonble question for a Bloom filter.

Specifically, a Bloom filter can tell you two things:

  1. When a value may be in a set.
  2. When a value is definitely not in a set

Carefully note that a Bloom filter can only tell you that a value might be in a set or that a value is definitely not in a set. It cannot tell you that a value is definitely in a set.

Link to this section Summary

Functions

Find the false positive rate for a given filter size, expected input size, and number of hash functions

Create a Bloom filter

Find the optimal number of hash functions for a given filter size and expected input size

Test if something might be in a bloom filter

Add a value to a bloom filter

Link to this section Types

Link to this section Functions

Link to this function false_positive_rate(filter_size, input_size, number_of_hashes)
false_positive_rate(pos_integer, pos_integer, pos_integer) :: float

Find the false positive rate for a given filter size, expected input size, and number of hash functions

Examples

iex> alias Prolly.BloomFilter
iex> BloomFilter.false_positive_rate(10000, 3000, 3) |> (fn(n) -> :erlang.round(n * 100) / 100 end).()
0.21
Link to this function new(filter_size, hash_fns)
new(pos_integer, [(String.t -> integer)]) :: t

Create a Bloom filter.

iex> alias Prolly.BloomFilter
iex> BloomFilter.new(20,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end]).filter
...> |> Enum.to_list
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Link to this function optimal_number_of_hashes(filter_size, input_size)
optimal_number_of_hashes(pos_integer, pos_integer) :: pos_integer

Find the optimal number of hash functions for a given filter size and expected input size

Examples

iex> alias Prolly.BloomFilter
iex> BloomFilter.optimal_number_of_hashes(10000, 1000)
7
Link to this function possible_member?(bloom_filter, value)
possible_member?(t, String.Chars) :: boolean

Test if something might be in a bloom filter

Examples

iex> alias Prolly.BloomFilter
iex> bf = BloomFilter.new(20,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
iex> bf = BloomFilter.update(bf, "hi")
iex> BloomFilter.possible_member?(bf, "hi")
true

iex> alias Prolly.BloomFilter
iex> bf = BloomFilter.new(20,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
iex> bf = BloomFilter.update(bf, "hi")
iex> BloomFilter.possible_member?(bf, "this is not hi!")
false

iex> alias Prolly.BloomFilter
iex> bf = BloomFilter.new(20,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
iex> bf = BloomFilter.update(bf, 7777777)
iex> BloomFilter.possible_member?(bf, 7777777)
true
Link to this function update(bloom_filter, value)
update(t, String.Chars) :: t

Add a value to a bloom filter

This operation runs in time proportional to the number of hash functions.

Examples

iex> alias Prolly.BloomFilter
iex> bf = BloomFilter.new(20,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
iex> BloomFilter.update(bf, "hi").filter |> Enum.to_list
[0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]

iex> alias Prolly.BloomFilter
iex> bf = BloomFilter.new(20,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
iex> BloomFilter.update(bf, 12345).filter |> Enum.to_list
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0]