cuckoo_filter (cuckoo_filter v0.3.1) View Source

High-performance, concurrent, and mutable Cuckoo Filter implemented using atomics for Erlang and Elixir.

Link to this section Summary

Functions

Adds an element to a filter.

Adds an element to a filter by its hash.

Returns the maximum capacity of a filter.
Checks if an element is in a filter.
Checks whether a filter contains a fingerprint at the given index or its alternative index.
Checks if an element is in a filter by its hash.

Deletes an element from a filter.

Deletes an element from a filter by its hash.

Exports a filter as a binary.

Returns the hash value of an element using the hash function of the filter.

Imports filter data from a binary created using export/1.

Creates a new cuckoo filter with the given capacity and options

Returns number of items in a filter.
Retrieves a cuckoo_filter from persistent_term by its name.

Link to this section Types

Specs

cuckoo_filter() :: #cuckoo_filter{}.

Specs

filter_name() :: term().

Specs

fingerprint() :: pos_integer().

Specs

hash() :: non_neg_integer().

Specs

index() :: non_neg_integer().

Specs

option() ::
    {name, filter_name()} |
    {fingerprint_size, 4 | 8 | 16 | 32 | 64} |
    {bucket_size, pos_integer()} |
    {max_evictions, non_neg_integer()} |
    {hash_function, fun((binary()) -> hash())}.

Specs

options() :: [option()].

Link to this section Functions

Specs

add(cuckoo_filter() | filter_name(), term()) -> ok | {error, not_enough_space}.

Equivalent to add(Filter, Element, infinity).

Link to this function

add(Filter, Element, LockTimeout)

View Source

Specs

add(cuckoo_filter() | filter_name(), term(), timeout()) ->
       ok | {error, not_enough_space | timeout};
   (cuckoo_filter() | filter_name(), term(), force) ->
       ok | {ok, Evicted :: {index(), fingerprint()}}.

Adds an element to a filter.

Returns ok if the insertion was successful, but could return {error, not_enough_space}, when the filter is nearing its capacity.

When LockTimeout is given, it could return {error, timeout}, if it can not acquire the lock within LockTimeout milliseconds.

If force is given as the 3rd argument, and there is no room for the element to be inserted, another random element is removed, and the removed element is returned as {ok, {Index, Fingerprint}}. In this case, elements are not relocated, and no lock is acquired.

Forced insertion can only be used with max_evictions set to 0.

Specs

add_hash(cuckoo_filter() | filter_name(), hash()) -> ok | {error, not_enough_space}.

Equivalent to add_hash(Filter, Element, infinity).

Link to this function

add_hash(Filter, Hash, LockTimeout)

View Source

Specs

add_hash(cuckoo_filter() | filter_name(), hash(), timeout()) ->
            ok | {error, not_enough_space | timeout};
        (cuckoo_filter() | filter_name(), hash(), force) ->
            ok | {ok, Evicted :: {index(), fingerprint()}}.

Adds an element to a filter by its hash.

Same as add/3 except that it accepts the hash of the element instead of the element.

Specs

capacity(cuckoo_filter() | filter_name()) -> pos_integer().
Returns the maximum capacity of a filter.
Link to this function

contains(Filter, Element)

View Source

Specs

contains(cuckoo_filter() | filter_name(), term()) -> boolean().
Checks if an element is in a filter.
Link to this function

contains_fingerprint(Filter, Index, Fingerprint)

View Source

Specs

contains_fingerprint(cuckoo_filter() | filter_name(), index(), fingerprint()) -> boolean().
Checks whether a filter contains a fingerprint at the given index or its alternative index.
Link to this function

contains_hash(Filter, Hash)

View Source

Specs

contains_hash(cuckoo_filter() | filter_name(), hash()) -> boolean().
Checks if an element is in a filter by its hash.

Specs

delete(cuckoo_filter() | filter_name(), term()) -> ok | {error, not_found}.

Equivalent to delete(Filter, Element, infinity).

Link to this function

delete(Filter, Element, LockTimeout)

View Source

Specs

delete(cuckoo_filter() | filter_name(), term(), timeout()) -> ok | {error, not_found | timeout}.

Deletes an element from a filter.

Returns ok if the deletion was successful, and returns {error, not_found} if the element could not be found in the filter.

When LockTimeout is given, it could return {error, timeout}, if it can not acquire the lock within LockTimeout milliseconds.

Note: A cuckoo filter can only delete items that are known to be inserted before. Deleting of non inserted items might lead to deletion of another random element.
Link to this function

delete_hash(Filter, Hash)

View Source

Specs

delete_hash(cuckoo_filter() | filter_name(), hash()) -> ok | {error, not_found}.

Equivalent to delete_hash(Filter, Element, infinity).

Link to this function

delete_hash(Filter, Hash, LockTimeout)

View Source

Specs

delete_hash(cuckoo_filter() | filter_name(), hash(), timeout()) ->
               ok | {error, not_found | timeout}.

Deletes an element from a filter by its hash.

Same as delete/3 except that it uses the hash of the element instead of the element.

Specs

export(cuckoo_filter() | filter_name()) -> binary().

Exports a filter as a binary.

Returned binary can be used to reconstruct the filter again, using import/2 function.
Link to this function

hash(Cuckoo_filter, Element)

View Source

Specs

hash(cuckoo_filter() | filter_name(), term()) -> hash().
Returns the hash value of an element using the hash function of the filter.

Specs

import(cuckoo_filter() | filter_name(), binary()) -> ok | {error, invalid_data_size}.

Imports filter data from a binary created using export/1.

Returns ok if the import was successful, but could return {ok, invalid_data_size} if the size of the given binary does not match the size of the filter.

Specs

new(pos_integer()) -> cuckoo_filter().

Equivalent to new(Capacity, []).

Specs

new(pos_integer(), options()) -> cuckoo_filter().

Creates a new cuckoo filter with the given capacity and options

Note that the actual capacity might be higher than the given capacity, because internally number of buckets in a cuckoo filter must be a power of 2.

Possible options are:
  • {name, Name}

    If you give it a name, created filter instance will be stored in persistent_term, and later you can access the filter by its name.

  • {fingerprint_size, FingerprintSize}

    FingerprintSize can be one of 4, 8, 16, 32, and 64 bits. Default fingerprint size is 16 bits.

  • {bucket_size, BucketSize}

    BucketSize must be a non negative integer, and the default value is 4. Higher bucket sizes can reduce insert time considerably since it reduces the number of relocations of existing fingerprints in occupied buckets, but it increases the lookup time, and false positive rate.

  • {max_evictions, MaxEvictions}

    MaxEvictions indicates the maximum number of relocation attemps before giving up when inserting a new element.

  • {hash_function, HashFunction}

    You can specify a custom hash function that accepts a binary as argument and returns hash value as an integer. By default xxh3 hash functions are used, and you need to manually add xxh3 to the list of your project dependencies.

Specs

size(cuckoo_filter() | filter_name()) -> non_neg_integer().
Returns number of items in a filter.

Specs

whereis(filter_name()) -> cuckoo_filter().
Retrieves a cuckoo_filter from persistent_term by its name.