gauzy/bloom_filter

This module provides an implementation of a Bloom filter, a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either “possibly in set” or “definitely not in set”.

Bloom filters are useful in situations where the size of the set would require an impractically large amount of memory to store, or where the cost of a false positive is acceptable compared to the cost of a more precise data structure.

The module provides functions for creating, inserting into, querying, and resetting Bloom filters. Optimization step size for finding optimal Bloom filter parameters.

Types

A Bloom filter data structure.

a is the type of item that can be stored in the filter.

pub opaque type BloomFilter(a)

Represents errors that can occur during Bloom filter operations.

pub type BloomFilterError {
  EqualHashFunctions
  InsertionError
  InvalidCapacity
  InvalidTargetErrorRate
}

Constructors

  • EqualHashFunctions

    The provided hash functions are equal, which is not allowed.

  • InsertionError

    An error occurred during insertion, likely an out-of-bounds index.

  • InvalidCapacity

    The specified capacity is invalid (must be greater than 0).

  • InvalidTargetErrorRate

    The specified target error rate is invalid (must be between 0.0 and 1.0 exclusively).

A pair of hash functions used by the Bloom filter.

a is the type of item that the hash functions operate on.

pub opaque type HashFunctionPair(a)

Functions

pub fn bit_size(filter filter: BloomFilter(a)) -> Int

Returns the size of the BloomFilter’s underlying bit array.

  • filter: The BloomFilter to get the size from
pub fn error_rate(filter filter: BloomFilter(a)) -> Float

Returns the BloomFilter’s actual false positive rate

  • filter: The BloomFilter to get the error rate from.
pub fn hash_fn_count(filter filter: BloomFilter(a)) -> Int

Returns the number of hash functions the BloomFilter uses.

  • filter: The BloomFilter to get the hash function count from
pub fn might_contain(filter: BloomFilter(a), item: a) -> Bool

Checks if the BloomFilter might contain the given item.

  • filter: The BloomFilter to check
  • item: The item to check for
pub fn new(
  capacity capacity: Int,
  target_error_rate target_error_rate: Float,
  with_hashes hash_function_pair: HashFunctionPair(a),
) -> Result(BloomFilter(a), BloomFilterError)

Creates a new BloomFilter.

  • capacity: The number of items the BloomFilter is expected to hold.
  • target_error_rate: The desired false positive rate (between 0.0 and 1.0).
  • hash_function_pair: The hash functions used to generate indices.
pub fn new_hash_fn_pair(
  hash_fn_1: fn(a) -> Int,
  hash_fn_2: fn(a) -> Int,
) -> Result(HashFunctionPair(a), BloomFilterError)

Creates a new pair of hash functions for the BloomFilter.

The hash functions must not be equal! For optimal performance, the hash functions should be random, uniform, and independent.

  • first_hash_function: The first hash function.
  • second_hash_function: The second hash function.
pub fn reset(filter: BloomFilter(a)) -> BloomFilter(a)

Returns an empty BloomFilter with the same characteristics as the input filter.

  • filter: The BloomFilter to reset
pub fn try_insert(
  filter: BloomFilter(a),
  item: a,
) -> Result(BloomFilter(a), BloomFilterError)

Tries to insert an item into the BloomFilter.

  • filter: The BloomFilter to insert into.
  • item: The item to insert.
Search Document