viva_glyph/codebook

Codebook - Learned vocabulary for vector quantization

A codebook is a set of prototype vectors (centroids). Each vector in latent space is mapped to its nearest centroid.

Theory

Based on Vector Quantization (VQ) from signal processing:

References

Types

Codebook: collection of prototype vectors

pub type Codebook {
  Codebook(
    centroids: List(List(Float)),
    dimension: Int,
    size: Int,
  )
}

Constructors

  • Codebook(centroids: List(List(Float)), dimension: Int, size: Int)

    Arguments

    centroids

    Prototype vectors (centroids)

    dimension

    Dimension of each vector

    size

    Number of centroids (vocabulary size)

Result of quantization: index and reconstruction error

pub type QuantizeResult {
  QuantizeResult(index: Int, error: Float)
}

Constructors

  • QuantizeResult(index: Int, error: Float)

    Arguments

    index

    Index of nearest centroid

    error

    Distance to nearest centroid (quantization error)

Values

pub fn dequantize(codebook: Codebook, index: Int) -> List(Float)

Dequantize: get centroid vector from index

pub fn empty(dimension: Int, size: Int) -> Codebook

Create empty codebook

pub fn from_vectors(
  vectors: List(List(Float)),
) -> option.Option(Codebook)

Create codebook from list of vectors

pub fn get(
  codebook: Codebook,
  index: Int,
) -> option.Option(List(Float))

Get centroid by index

pub fn init_deterministic(
  dimension: Int,
  size: Int,
  seed: Int,
) -> Codebook

Initialize codebook with random-ish values based on seed (Deterministic pseudo-random for reproducibility)

pub fn quantize(
  codebook: Codebook,
  input: List(Float),
) -> QuantizeResult

Find nearest centroid to input vector Returns index and quantization error

pub fn update_centroid(
  codebook: Codebook,
  index: Int,
  new_centroid: List(Float),
) -> Codebook

Update a single centroid (for training)

Search Document