prolly v0.2.0 Prolly.BloomFilter
Use a Bloom filter when you want to keep track of whether you have seen a given value or not.
For example, the quesetion “have I seen the string foo
so far in the stream?”
is a reasonble question for a Bloom filter.
Specifically, a Bloom filter can tell you two things:
- When a value may be in a set.
- When a value is definitely not in a set
Carefully note that a Bloom filter can only tell you that a value might be in a set or that a value is definitely not in a set. It cannot tell you that a value is definitely in a set.
Link to this section Summary
Functions
Find the false positive rate for a given filter size, expected input size, and number of hash functions
Create a Bloom filter
Find the optimal number of hash functions for a given filter size and expected input size
Test if something might be in a bloom filter
Add a value to a bloom filter
Link to this section Types
Link to this section Functions
false_positive_rate(pos_integer, pos_integer, pos_integer) :: float
Find the false positive rate for a given filter size, expected input size, and number of hash functions
Examples
iex> alias Prolly.BloomFilter
iex> BloomFilter.false_positive_rate(10000, 3000, 3) |> (fn(n) -> :erlang.round(n * 100) / 100 end).()
0.21
Create a Bloom filter.
iex> alias Prolly.BloomFilter
iex> BloomFilter.new(20,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end]).filter
...> |> Enum.to_list
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
optimal_number_of_hashes(pos_integer, pos_integer) :: pos_integer
Find the optimal number of hash functions for a given filter size and expected input size
Examples
iex> alias Prolly.BloomFilter
iex> BloomFilter.optimal_number_of_hashes(10000, 1000)
7
possible_member?(t, String.Chars) :: boolean
Test if something might be in a bloom filter
Examples
iex> alias Prolly.BloomFilter
iex> bf = BloomFilter.new(20,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
iex> bf = BloomFilter.update(bf, "hi")
iex> BloomFilter.possible_member?(bf, "hi")
true
iex> alias Prolly.BloomFilter
iex> bf = BloomFilter.new(20,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
iex> bf = BloomFilter.update(bf, "hi")
iex> BloomFilter.possible_member?(bf, "this is not hi!")
false
iex> alias Prolly.BloomFilter
iex> bf = BloomFilter.new(20,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
iex> bf = BloomFilter.update(bf, 7777777)
iex> BloomFilter.possible_member?(bf, 7777777)
true
Add a value to a bloom filter
This operation runs in time proportional to the number of hash functions.
Examples
iex> alias Prolly.BloomFilter
iex> bf = BloomFilter.new(20,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
iex> BloomFilter.update(bf, "hi").filter |> Enum.to_list
[0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
iex> alias Prolly.BloomFilter
iex> bf = BloomFilter.new(20,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
iex> BloomFilter.update(bf, 12345).filter |> Enum.to_list
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0]