ExDataSketch.Hash (ExDataSketch v0.8.0)

Copy Markdown View Source

Stable 64-bit hash interface for ExDataSketch.

All sketch algorithms require a deterministic hash function that maps arbitrary Elixir terms to 64-bit unsigned integers. This module provides that interface with automatic backend selection and a pure-Elixir fallback.

Hash Properties

  • Output range: 0..2^64-1 (unsigned 64-bit integer).
  • Deterministic: same input always produces same output within the same runtime configuration.
  • Uniform distribution: output bits are well-distributed for sketch accuracy.

Auto-detection

When no custom :hash_fn is provided, hash64/2 automatically selects the best available hash implementation:

  • XXHash3 (NIF): When the Rust NIF is loaded, hash64/2 uses XXHash3 which produces native 64-bit hashes with zero Elixir-side overhead. XXHash3 output is stable across platforms.

  • phash2 + mix64 (pure): When the NIF is not available, hash64/2 falls back to :erlang.phash2/2 with a fixnum-safe 64-bit mixer. The mixer uses 16-bit partial products to avoid bigint heap allocations while preserving full 64-bit output quality.

The NIF availability check is performed once and cached in :persistent_term for zero-cost subsequent lookups.

Pluggable Hash

Pass hash_fn: fn term -> non_neg_integer end to override the default. The custom function must return values in 0..2^64-1.

Stability

:erlang.phash2/2 output is not guaranteed stable across OTP major versions. XXHash3 output is stable across platforms. For cross-version stability, use the NIF build (XXHash3) or supply a custom :hash_fn.

Summary

Types

Static description of a hash algorithm. Returned by algorithm_info/1.

Functions

Returns the static descriptor for a hash algorithm.

Returns the default hash algorithm for new sketches.

Returns the default hash strategy based on NIF availability.

Hashes an arbitrary Elixir term to a 64-bit unsigned integer.

Hashes a raw binary to a 64-bit unsigned integer.

Returns whether the NIF is available for hashing.

Resolves the effective hash strategy for a sketch given user options.

Returns the list of hash algorithm identifiers supported by this build.

Validates that two sets of sketch options have compatible hashing configuration.

Hashes a binary using XXHash3 (64-bit) via Rust NIF.

Hashes a binary using XXHash3 (64-bit) with a seed via Rust NIF.

Types

algorithm_info()

@type algorithm_info() :: %{
  id: hash_strategy(),
  name: String.t(),
  output_bits: 64,
  has_seed: boolean(),
  available?: boolean(),
  stability: :stable | :otp_dependent | :runtime_dependent
}

Static description of a hash algorithm. Returned by algorithm_info/1.

hash64()

@type hash64() :: non_neg_integer()

hash_opt()

@type hash_opt() ::
  {:seed, non_neg_integer()}
  | {:hash_fn, (term() -> hash64())}
  | {:hash_strategy, hash_strategy()}

hash_strategy()

@type hash_strategy() :: :phash2 | :xxhash3 | :murmur3 | :custom

opts()

@type opts() :: [hash_opt()]

Functions

algorithm_info(other)

@spec algorithm_info(hash_strategy()) :: algorithm_info()

Returns the static descriptor for a hash algorithm.

See algorithm_info/0 for the returned map shape.

Examples

iex> info = ExDataSketch.Hash.algorithm_info(:xxhash3)
iex> info.id
:xxhash3
iex> info.output_bits
64

iex> info = ExDataSketch.Hash.algorithm_info(:murmur3)
iex> info.has_seed
true
iex> info.stability
:stable

iex> info = ExDataSketch.Hash.algorithm_info(:phash2)
iex> info.stability
:otp_dependent

default_algorithm()

@spec default_algorithm() :: :xxhash3 | :phash2

Returns the default hash algorithm for new sketches.

This is the v0.8.0 successor to default_hash_strategy/0 and uses the same selection logic. Prefer this name in new code; the old name is retained for backward compatibility.

Examples

iex> ExDataSketch.Hash.default_algorithm() in [:xxhash3, :phash2]
true

default_hash_strategy()

@spec default_hash_strategy() :: :xxhash3 | :phash2

Returns the default hash strategy based on NIF availability.

Returns :xxhash3 when the NIF is loaded, :phash2 otherwise.

hash64(term, opts \\ [])

@spec hash64(term(), opts()) :: hash64()

Hashes an arbitrary Elixir term to a 64-bit unsigned integer.

When no :hash_fn is provided, automatically uses XXHash3 via NIF if available, otherwise falls back to phash2 with fixnum-safe bit mixing.

Options

  • :seed - seed value for the hash (default: 0). Combined with the base hash.
  • :hash_fn - custom hash function (term -> 0..2^64-1). When provided, :seed is ignored and the function is called directly.

Examples

iex> h = ExDataSketch.Hash.hash64("hello")
iex> is_integer(h) and h >= 0
true

iex> ExDataSketch.Hash.hash64("hello") == ExDataSketch.Hash.hash64("hello")
true

iex> ExDataSketch.Hash.hash64("hello") != ExDataSketch.Hash.hash64("world")
true

iex> ExDataSketch.Hash.hash64("test", seed: 42) != ExDataSketch.Hash.hash64("test", seed: 0)
true

hash64_binary(binary, opts \\ [])

@spec hash64_binary(binary(), opts()) :: hash64()

Hashes a raw binary to a 64-bit unsigned integer.

Operates directly on binary bytes without term encoding overhead. Useful when the input is already binary data (e.g., from external sources).

When no :hash_fn is provided, automatically uses XXHash3 via NIF if available, otherwise falls back to phash2 with fixnum-safe bit mixing.

Options

Same as hash64/2.

Examples

iex> h = ExDataSketch.Hash.hash64_binary(<<1, 2, 3>>)
iex> is_integer(h) and h >= 0
true

iex> ExDataSketch.Hash.hash64_binary(<<"abc">>) == ExDataSketch.Hash.hash64_binary(<<"abc">>)
true

nif_available?()

@spec nif_available?() :: boolean()

Returns whether the NIF is available for hashing.

The result is computed once and cached in :persistent_term.

resolve_strategy(opts)

@spec resolve_strategy(keyword()) :: hash_strategy()

Resolves the effective hash strategy for a sketch given user options.

Resolution precedence:

  1. If :hash_fn is set → :custom (closure-based, never merge-compatible).
  2. If the caller passed :hash_strategy, that value is honored. Unknown values are rejected with ArgumentError.
  3. Otherwise default_algorithm/0 is used.

This is the single source of truth for sketch constructors. It exists to let callers select :murmur3 (Apache DataSketches interop) or :phash2 (BEAM-only fallback) at sketch creation time without surprising the default-choice machinery.

Examples

iex> ExDataSketch.Hash.resolve_strategy([])
ExDataSketch.Hash.default_algorithm()

iex> ExDataSketch.Hash.resolve_strategy(hash_strategy: :murmur3)
:murmur3

iex> ExDataSketch.Hash.resolve_strategy(hash_fn: fn _ -> 0 end)
:custom

iex> ExDataSketch.Hash.resolve_strategy(hash_strategy: :phash2)
:phash2

iex> try do
...>   ExDataSketch.Hash.resolve_strategy(hash_strategy: :sha256)
...> rescue
...>   ArgumentError -> :raised
...> end
:raised

supported_algorithms()

@spec supported_algorithms() :: [hash_strategy()]

Returns the list of hash algorithm identifiers supported by this build.

:custom is included to indicate that user-supplied :hash_fn closures are an accepted hash strategy, but they are NEVER returned by default_algorithm/0 and are NEVER merge-compatible across sketches.

Examples

iex> algos = ExDataSketch.Hash.supported_algorithms()
iex> Enum.all?([:phash2, :xxhash3, :murmur3, :custom], &(&1 in algos))
true

validate_merge_hash_compat!(opts_a, opts_b, sketch_type)

@spec validate_merge_hash_compat!(Keyword.t(), Keyword.t(), String.t()) :: :ok

Validates that two sets of sketch options have compatible hashing configuration.

Raises ExDataSketch.Errors.IncompatibleSketchesError if:

  • Either sketch uses a custom :hash_fn (closures cannot be compared)
  • Hash strategies differ (e.g. :xxhash3 vs :phash2)
  • Seeds differ (default is 0)

This is a backward-compatible shim over ExDataSketch.Hash.Validation.validate_options!/3. Prefer the new module in new code; this function remains stable for all v0.x sketches.

xxhash3_64(data)

@spec xxhash3_64(binary()) :: hash64()

Hashes a binary using XXHash3 (64-bit) via Rust NIF.

Returns a deterministic 64-bit hash that is stable across platforms and versions when the Rust NIF is available. Falls back to the phash2-based hash if the NIF is not loaded; the fallback is NOT stable across OTP major versions (see module docs).

This function operates on raw binary data. For Elixir terms, convert to binary first (e.g., using :erlang.term_to_binary/1 or to_string/1).

Examples

iex> h = ExDataSketch.Hash.xxhash3_64("hello")
iex> is_integer(h) and h >= 0
true

iex> ExDataSketch.Hash.xxhash3_64("hello") == ExDataSketch.Hash.xxhash3_64("hello")
true

xxhash3_64(data, seed)

@spec xxhash3_64(binary(), non_neg_integer()) :: hash64()

Hashes a binary using XXHash3 (64-bit) with a seed via Rust NIF.

Falls back to the phash2-based hash if the NIF is not available.

Examples

iex> h = ExDataSketch.Hash.xxhash3_64("hello", 42)
iex> is_integer(h) and h >= 0
true

iex> ExDataSketch.Hash.xxhash3_64("hello", 0) != ExDataSketch.Hash.xxhash3_64("hello", 42)
true