gauzy/bloom_filter

This module provides an implementation of a Bloom filter, a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either “possibly in set” or “definitely not in set”.

Bloom filters are useful in situations where the size of the set would require an impractically large amount of memory to store, or where the cost of a false positive is acceptable compared to the cost of a more precise data structure.

The module provides functions for creating, inserting into, querying, and resetting Bloom filters.

Types

A space-efficient data structure to probabilistically check set membership.

pub opaque type BloomFilter(item)

Represents errors that can occur during Bloom filter operations.

pub type BloomFilterError {
  EqualHashFunctions
  InvalidCapacity
  InvalidTargetErrorRate
}

Constructors

  • EqualHashFunctions

    The provided hash functions are equal, which is not allowed.

  • InvalidCapacity

    The specified capacity is invalid (must be greater than 0).

  • InvalidTargetErrorRate

    The specified target error rate is invalid (must be between 0.0 and 1.0 exclusively).

A pair of hash functions used by the Bloom filter.

item is the type for which the hash functions provide an Int digest.

pub opaque type HashFunctionPair(item)

Values

pub fn bit_size(of filter: BloomFilter(a)) -> Int

Returns the size of the BloomFilter’s underlying bit array.

  • filter: The BloomFilter from which to get the size
pub fn estimate_cardinality(in filter: BloomFilter(a)) -> Int

Returns an approximation of unique items inserted into the BloomFilter. This can differ substantially from reality, especially in smaller filters.

  • filter: The BloomFilter for which to estimate
pub fn false_positive_rate(of filter: BloomFilter(a)) -> Float

Returns the BloomFilter’s actual false positive rate

  • filter: The BloomFilter from which to get the error rate
pub fn hash_fn_count(of filter: BloomFilter(a)) -> Int

Returns the number of hash functions the BloomFilter uses.

  • filter: The BloomFilter from which to get the hash function count
pub fn insert(
  in filter: BloomFilter(a),
  insert item: a,
) -> BloomFilter(a)

Inserts an item into the BloomFilter.

  • filter: The BloomFilter to insert into.
  • item: The item to insert.
pub fn might_contain(
  in filter: BloomFilter(a),
  search item: a,
) -> Bool

Checks if the BloomFilter might contain the given item.

  • filter: The BloomFilter to check
  • item: The item to check for
pub fn new(
  capacity capacity: Int,
  target_error_rate target_error_rate: Float,
  hash_function_pair hash_function_pair: HashFunctionPair(a),
) -> Result(BloomFilter(a), BloomFilterError)

Creates a new BloomFilter.

  • capacity: The number of items the BloomFilter is expected to hold.
  • target_error_rate: The desired false positive rate (between 0.0 and 1.0).
  • hash_function_pair: The hash functions used to generate indices.
pub fn new_hash_fn_pair(
  hash_fn_1: fn(a) -> Int,
  hash_fn_2: fn(a) -> Int,
) -> Result(HashFunctionPair(a), BloomFilterError)

Creates a new pair of hash functions for the BloomFilter.

The hash functions must not be equal! For optimal performance, the hash functions should be random, uniform, and pairwise independent.

  • first_hash_function: The first hash function.
  • second_hash_function: The second hash function.
pub fn reset(filter filter: BloomFilter(a)) -> BloomFilter(a)

Returns an empty BloomFilter with the same characteristics as the input filter.

  • filter: The BloomFilter to reset
Search Document