Bloomex
This module implements a Scalable Bloom Filter.
Examples
iex> bf = Bloomex.scalable(1000, 0.1, 0.1, 2)
%Bloomex.ScalableBloom...
iex> bf = Bloomex.add(bf, 5)
%Bloomex.ScalableBloom...
iex> Bloomex.member?(bf, 5)
true
iex> bf = Bloomex.add(bf, 100)
%Bloomex.ScalableBloom...
iex> Bloomex.member?(bf, 100)
true
iex> Bloomex.member?(bf, 105)
false
iex> Bloomex.member?(bf, 101) # false positive
true
Summary
add(bloom, e) | Returns a bloom filter with the element |
capacity(bloom) | Returns the capacity of the bloom filter |
member?(bloom, e) | Returns |
plain(capacity, error, hash_func \\ fn x -> :erlang.phash2(x, :erlang.bsl(1, 32)) end) | Returns a plain Bloom filter based on the provided arguments:
|
scalable(capacity, error, error_ratio, growth, hash_func \\ fn x -> :erlang.phash2(x, :erlang.bsl(1, 32)) end) | Returns a scalable Bloom filter based on the provided arguments:
|
size(bloom) | Returns the number of elements currently in the bloom filter |
Types ↑
Functions
Specs:
Returns a bloom filter with the element e
added.
Specs:
- capacity(Bloomex.t) :: pos_integer | :infinity
Returns the capacity of the bloom filter.
A plain bloom filter will always have a fixed capacity, while a scalable one will always have a theoretically infite capacity.
Specs:
- member?(Bloomex.t, any) :: boolean
Returns true
if the element e
exists in the bloom filter, otherwise returns false
.
Keep in mind that you may get false positives, but never false negatives.
Specs:
- plain(pos_integer, float, (term -> pos_integer)) :: Bloomex.Bloom.t
Returns a plain Bloom filter based on the provided arguments:
capacity
, used to calculate the size of each bitvector sliceerror
, the error probabilityhash_func
, a hashing function
If a hash function is not provided then :erlang.phash2/2
will be used with
the maximum range possible (2^32)
.
Restrictions:
capacity
must be a positive integererror
must be a float between0
and1
hash_func
must be a function of typeterm -> pos_integer
The function follows a rule of thumb due to double hashing where
capacity >= 4 / error
must hold true.
Specs:
- scalable(pos_integer, float, float, 1 | 2 | 3, (term -> pos_integer)) :: Bloomex.ScalableBloom.t
Returns a scalable Bloom filter based on the provided arguments:
capacity
, the initial capacity before expandingerror
, the error probabilityerror_ratio
, the error probability ratiogrowth
, the growth ratio when fullhash_func
, a hashing function
If a hash function is not provided then :erlang.phash2/2
will be used with
the maximum range possible (2^32)
.
Restrictions:
capacity
must be a positive integererror
must be a float between0
and1
error_ratio
must be a float between0
and1
growth
must be a positive integer between1
and3
hash_func
must be a function of typeterm -> pos_integer
The function follows a rule of thumb due to double hashing where
capacity >= 4 / (error * (1 - error_ratio))
must hold true.