gauzy/bloom_filter
This module provides an implementation of a Bloom filter, a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either “possibly in set” or “definitely not in set”.
Bloom filters are useful in situations where the size of the set would require an impractically large amount of memory to store, or where the cost of a false positive is acceptable compared to the cost of a more precise data structure.
The module provides functions for creating, inserting into, querying, and resetting Bloom filters.
Types
A space-efficient data structure to probabilistically check set membership.
pub opaque type BloomFilter(item)
Represents errors that can occur during Bloom filter operations.
pub type BloomFilterError {
EqualHashFunctions
InvalidCapacity
InvalidTargetErrorRate
}
Constructors
-
EqualHashFunctions
The provided hash functions are equal, which is not allowed.
-
InvalidCapacity
The specified capacity is invalid (must be greater than 0).
-
InvalidTargetErrorRate
The specified target error rate is invalid (must be between 0.0 and 1.0 exclusively).
A pair of hash functions used by the Bloom filter.
item
is the type for which the hash functions provide an Int
digest.
pub opaque type HashFunctionPair(item)
Values
pub fn bit_size(of filter: BloomFilter(a)) -> Int
Returns the size of the BloomFilter
’s underlying bit array.
filter
: TheBloomFilter
from which to get the size
pub fn estimate_cardinality(in filter: BloomFilter(a)) -> Int
Returns an approximation of unique items inserted into the BloomFilter
.
This can differ substantially from reality, especially in smaller filters.
filter
: TheBloomFilter
for which to estimate
pub fn false_positive_rate(of filter: BloomFilter(a)) -> Float
Returns the BloomFilter
’s actual false positive rate
filter
: TheBloomFilter
from which to get the error rate
pub fn hash_fn_count(of filter: BloomFilter(a)) -> Int
Returns the number of hash functions the BloomFilter
uses.
filter
: TheBloomFilter
from which to get the hash function count
pub fn insert(
in filter: BloomFilter(a),
insert item: a,
) -> BloomFilter(a)
Inserts an item into the BloomFilter
.
filter
: TheBloomFilter
to insert into.item
: The item to insert.
pub fn might_contain(
in filter: BloomFilter(a),
search item: a,
) -> Bool
Checks if the BloomFilter
might contain the given item
.
filter
: TheBloomFilter
to checkitem
: The item to check for
pub fn new(
capacity capacity: Int,
target_error_rate target_error_rate: Float,
hash_function_pair hash_function_pair: HashFunctionPair(a),
) -> Result(BloomFilter(a), BloomFilterError)
Creates a new BloomFilter
.
capacity
: The number of items theBloomFilter
is expected to hold.target_error_rate
: The desired false positive rate (between 0.0 and 1.0).hash_function_pair
: The hash functions used to generate indices.
pub fn new_hash_fn_pair(
hash_fn_1: fn(a) -> Int,
hash_fn_2: fn(a) -> Int,
) -> Result(HashFunctionPair(a), BloomFilterError)
Creates a new pair of hash functions for the BloomFilter
.
The hash functions must not be equal! For optimal performance, the hash functions should be random, uniform, and pairwise independent.
first_hash_function
: The first hash function.second_hash_function
: The second hash function.
pub fn reset(filter filter: BloomFilter(a)) -> BloomFilter(a)
Returns an empty BloomFilter
with the same characteristics as the input filter.
filter
: TheBloomFilter
to reset