ExDataSketch.Hash.Metadata (ExDataSketch v0.9.0)

Copy Markdown View Source

Shared hash + sketch metadata block.

Every sketch persisted by ExDataSketch v0.8.0 and later embeds this metadata block. The block records exactly which hash algorithm and seed produced the sketch, which sketch family it belongs to, and which serialization versions are in play. This is the foundation of:

  • merge safety (rejecting merges between sketches built with different hashes or seeds);
  • corruption detection (Phase 2 wraps this block in a CRC-checked frame);
  • cross-platform reproducibility (sketches built on one host can be merged on another only when their metadata blocks are compatible);
  • future interoperability work (Apache DataSketches, etc.).

This module is the Phase 1 building block consumed by ExDataSketch.Binary.Header in Phase 2. Phase 1 does not yet rewrite the EXSK codec; Phase 2 will do that and stamp this block into the header.

Binary Layout (metadata-block v1)

All multi-byte integers are little-endian.

Offset  Size    Field
------  ------  -----
0       1       block_version (u8 = 1)
1       1       hash_algorithm (u8: 0=phash2, 1=xxhash3, 2=murmur3, 255=custom)
2       8       hash_seed (u64)
10      1       sketch_family (u8, matches EXSK sketch_id)
11      1       sketch_family_version (u8)
12      1       backend_type (u8: 0=unspecified, 1=pure, 2=rust)
13      1       flags (u8; reserved, must be 0 in v1)
14      2       extension_size (u16)  number of trailing extension bytes
16      N       extension bytes (forward-compat; ignored on decode in v1)

Total: 16 + extension_size bytes. v1 writers MUST emit extension_size == 0. v1 readers MUST round-trip unknown extension bytes verbatim on re-encode (forward compatibility).

Block version is independent of the EXSK frame version: bumping one does not require bumping the other.

Summary

Functions

Returns the atom for a hash algorithm wire-byte.

Returns the wire-byte for a hash algorithm atom.

Returns the atom for a backend wire-byte.

Returns the wire-byte for a backend atom.

Returns the current metadata block version.

Decodes a metadata binary into a {t(), rest} pair on success.

Encodes a metadata struct to its versioned binary representation.

Builds a metadata struct from explicit fields.

Types

algorithm()

@type algorithm() :: :phash2 | :xxhash3 | :murmur3 | :custom

backend()

@type backend() :: :unspecified | :pure | :rust

t()

@type t() :: %ExDataSketch.Hash.Metadata{
  algorithm: algorithm(),
  backend: backend(),
  block_version: pos_integer(),
  extension: binary(),
  flags: non_neg_integer(),
  seed: non_neg_integer(),
  sketch_family: non_neg_integer(),
  sketch_family_version: non_neg_integer()
}

Functions

algorithm_from_byte(other)

@spec algorithm_from_byte(byte()) :: {:ok, algorithm()} | {:error, Exception.t()}

Returns the atom for a hash algorithm wire-byte.

Examples

iex> ExDataSketch.Hash.Metadata.algorithm_from_byte(1)
{:ok, :xxhash3}

iex> {:error, _} = ExDataSketch.Hash.Metadata.algorithm_from_byte(7)

algorithm_to_byte(other)

@spec algorithm_to_byte(algorithm()) :: 0 | 1 | 2 | 255

Returns the wire-byte for a hash algorithm atom.

Examples

iex> ExDataSketch.Hash.Metadata.algorithm_to_byte(:xxhash3)
1

backend_from_byte(other)

@spec backend_from_byte(byte()) :: {:ok, backend()} | {:error, Exception.t()}

Returns the atom for a backend wire-byte.

backend_to_byte(other)

@spec backend_to_byte(backend()) :: 0 | 1 | 2

Returns the wire-byte for a backend atom.

block_version()

@spec block_version() :: pos_integer()

Returns the current metadata block version.

Examples

iex> ExDataSketch.Hash.Metadata.block_version()
1

decode(arg1)

@spec decode(binary()) :: {:ok, t(), binary()} | {:error, Exception.t()}

Decodes a metadata binary into a {t(), rest} pair on success.

Returns {:ok, metadata, rest_binary} so the caller (e.g. the binary header parser in Phase 2) can continue consuming bytes after the metadata block.

Returns {:error, %DeserializationError{}} if the binary is malformed or references an unknown algorithm/backend, or carries an unsupported block version.

Examples

iex> meta = ExDataSketch.Hash.Metadata.new(:murmur3, 9001, 3, 2, :pure)
iex> bin = ExDataSketch.Hash.Metadata.encode(meta)
iex> {:ok, decoded, <<>>} = ExDataSketch.Hash.Metadata.decode(bin)
iex> decoded.algorithm
:murmur3
iex> decoded.seed
9001
iex> decoded.sketch_family
3

encode(meta)

@spec encode(t()) :: binary()

Encodes a metadata struct to its versioned binary representation.

Examples

iex> meta = ExDataSketch.Hash.Metadata.new(:xxhash3, 0, 1, 1, :rust)
iex> bin = ExDataSketch.Hash.Metadata.encode(meta)
iex> byte_size(bin)
16

new(algorithm, seed, sketch_family, sketch_family_version, backend)

Builds a metadata struct from explicit fields.

Examples

iex> meta = ExDataSketch.Hash.Metadata.new(:xxhash3, 0, 1, 1, :pure)
iex> meta.algorithm
:xxhash3
iex> meta.sketch_family
1