Bloomex

This module implements a Scalable Bloom Filter.

Examples

iex> bf = Bloomex.scalable(1000, 0.1, 0.1, 2)
%Bloomex.ScalableBloom...

iex> bf = Bloomex.add(bf, 5)
%Bloomex.ScalableBloom...

iex> Bloomex.member?(bf, 5)
true

iex> bf = Bloomex.add(bf, 100)
%Bloomex.ScalableBloom...

iex> Bloomex.member?(bf, 100)
true

iex> Bloomex.member?(bf, 105)
false

iex> Bloomex.member?(bf, 101) # false positive
true

Summary

Functions

Returns a bloom filter with the element e added

Returns the capacity of the bloom filter

Returns true if the element e exists in the bloom filter, otherwise returns false

Returns a plain Bloom filter based on the provided arguments:

  • capacity, used to calculate the size of each bitvector slice
  • error, the error probability
  • hash_func, a hashing function

Returns a scalable Bloom filter based on the provided arguments:

  • capacity, the initial capacity before expanding
  • error, the error probability
  • error_ratio, the error probability ratio
  • growth, the growth ratio when full
  • hash_func, a hashing function

Returns the number of elements currently in the bloom filter

Types

Functions

add(bloom, e)

Specs

add(Bloomex.t, any) :: Bloomex.t

Returns a bloom filter with the element e added.

capacity(arg1)

Specs

capacity(Bloomex.t) :: pos_integer | :infinity

Returns the capacity of the bloom filter.

A plain bloom filter will always have a fixed capacity, while a scalable one will always have a theoretically infite capacity.

member?(bloom, e)

Specs

member?(Bloomex.t, any) :: boolean

Returns true if the element e exists in the bloom filter, otherwise returns false.

Keep in mind that you may get false positives, but never false negatives.

plain(capacity, error, hash_func \\ fn x -> :erlang.phash2(x, 1 <<< 32) end)

Specs

plain(pos_integer, float, (term -> pos_integer)) :: Bloomex.Bloom.t

Returns a plain Bloom filter based on the provided arguments:

  • capacity, used to calculate the size of each bitvector slice
  • error, the error probability
  • hash_func, a hashing function

If a hash function is not provided then :erlang.phash2/2 will be used with the maximum range possible (2^32).

Restrictions:

  • capacity must be a positive integer
  • error must be a float between 0 and 1
  • hash_func must be a function of type term -> pos_integer

The function follows a rule of thumb due to double hashing where capacity >= 4 / error must hold true.

scalable(capacity, error, error_ratio, growth, hash_func \\ fn x -> :erlang.phash2(x, 1 <<< 32) end)

Specs

scalable(pos_integer, float, float, 1 | 2 | 3, (term -> pos_integer)) :: Bloomex.ScalableBloom.t

Returns a scalable Bloom filter based on the provided arguments:

  • capacity, the initial capacity before expanding
  • error, the error probability
  • error_ratio, the error probability ratio
  • growth, the growth ratio when full
  • hash_func, a hashing function

If a hash function is not provided then :erlang.phash2/2 will be used with the maximum range possible (2^32).

Restrictions:

  • capacity must be a positive integer
  • error must be a float between 0 and 1
  • error_ratio must be a float between 0 and 1
  • growth must be a positive integer between 1 and 3
  • hash_func must be a function of type term -> pos_integer

The function follows a rule of thumb due to double hashing where capacity >= 4 / (error * (1 - error_ratio)) must hold true.

size(arg1)

Specs

size(t) :: pos_integer

Returns the number of elements currently in the bloom filter.