View Source Vettore

Vettore is a high-performance Elixir library for fast, in-memory operations on vector (embedding) data. It leverages a Rust backend via Rustler to store and manipulate vectors efficiently in a concurrent-safe HashMap.

features

Features

  • Collections: Create named sets of embeddings with a fixed dimension and a choice of similarity metric.
  • CRUD operations: Insert, batch-insert, retrieve, and delete embeddings by ID or by vector.
  • Similarity search: Nearest-neighbor search with customizable :limit and optional metadata filtering.
  • Reranking: Maximal Marginal Relevance (MMR) reranker for diversity-aware result reordering.
  • Distance helpers: Standalone Euclidean, Cosine, Dot, and Hamming metrics, plus binary compression for ultra-fast comparisons.

installation

Installation

Add vettore to your list of dependencies in mix.exs:

def deps do
  [
    {:vettore, "~> 0.2.0"}
  ]
end

Then fetch and compile:

mix deps.get
mix compile

Note: The first compile will build the Rust crate; ensure you have a recent Rust toolchain installed.

quickstart

Quickstart

# 1. Start a new in-memory database reference
db = Vettore.new()

# 2. Create a collection named "my_collection" with 3-dimensional vectors
:ok = Vettore.create_collection(db, "my_collection", 3, :euclidean)

# 3. Insert a single embedding
embedding = %Vettore.Embedding{
  value: "item_1",
  vector: [1.0, 2.0, 3.0],
  metadata: %{"note" => "first vector"}
}
:ok = Vettore.insert(db, "my_collection", embedding)

# 4. Retrieve by ID
{:ok, emb} = Vettore.get_by_value(db, "my_collection", "item_1")
IO.inspect(emb.vector, label: "Vector")

# 5. Similarity search (top 2 nearest neighbors)
{:ok, results} = Vettore.similarity_search(db, "my_collection", [1.5, 1.5, 1.5], limit: 2)
IO.inspect(results, label: "Top-2 Results")

# 6. Rerank with MMR for diversity (alpha = 0.7)
{:ok, reranked} = Vettore.rerank(db, "my_collection", results, limit: 2, alpha: 0.7)
IO.inspect(reranked, label: "MMR Reranked")

api-reference

API Reference

vettore-new-0

Vettore.new/0

def new() :: reference()

Allocates and returns an in-memory database handle backed by Rust.


vettore-create_collection-5

Vettore.create_collection/5

@spec create_collection(
        db :: reference(),
        name :: String.t(),
        dim :: pos_integer(),
        metric :: :euclidean | :cosine | :dot | :hnsw | :binary,
        opts :: [keep_embeddings: boolean()]
      ) :: {:ok, String.t()} | {:error, String.t()}
  • name: Collection identifier
  • dim: Dimensionality of vectors
  • metric: Similarity measure
  • opts:
    • :keep_embeddings (default: true) — whether to retain embeddings on deletion

vettore-insert-3

Vettore.insert/3

@spec insert(
        db :: reference(),
        collection :: String.t(),
        embedding :: Vettore.Embedding.t()
      ) :: {:ok, String.t()} | {:error, String.t()}

Insert a single %Vettore.Embedding{} struct into the named collection.


vettore-batch-3

Vettore.batch/3

@spec batch(
        db :: reference(),
        collection :: String.t(),
        embeddings :: [Vettore.Embedding.t()]
      ) :: {:ok, [String.t()]} | {:error, String.t()}

Batch-insert multiple embeddings at once; non-embedding elements are ignored.


retrieval-and-deletion

Retrieval and Deletion


vettore-similarity_search-4

Vettore.similarity_search/4

@spec similarity_search(
        db :: reference(),
        collection :: String.t(),
        query :: [number()],
        opts :: [limit: pos_integer(), filter: map()]
      ) :: {:ok, [{String.t(), float()}]} | {:error, String.t()}
  • limit (default: 10)
  • filter: metadata map to pre-filter embeddings

vettore-rerank-4-mmr

Vettore.rerank/4 (MMR)

@spec rerank(
        db :: reference(),
        collection :: String.t(),
        initial :: [{String.t(), number()}],
        opts :: [limit: pos_integer(), alpha: float()]
      ) :: {:ok, [{String.t(), number()}]} | {:error, String.t()}
  • alpha: 0.0..1.0 balance between relevance and diversity

distance-helpers-vettore-distance

Distance Helpers (Vettore.Distance)

You can call these functions without creating a DB or collection:

Vettore.Distance.euclidean([1.0,2.0], [2.0,3.0])      # => 1 / (1 + L2)
Vettore.Distance.cosine([1,0],[0,1])                  # => (dot + 1) / 2
Vettore.Distance.dot_product([1,2],[3,4])             # => raw dot
Vettore.Distance.hamming(bits1, bits2)                # => Hamming distance
bits = Vettore.Distance.compress_f32_vector([0.1,0.4])

# MMR re-ranker standalone (collection-agnostic)
initial = [{"id1", 0.9}, {"id2", 0.85}, ...]
embeds  = [{"id1", [v1...]}, {"id2", [v2...]}, ...]
Vettore.Distance.mmr_rerank(initial, embeds, "cosine", 0.5, 5)

The similarity_search function works as follows:

  1. It retrieves the target collection and verifies that the query vector’s dimension matches.
  2. Depending on the chosen distance metric, it selects an appropriate function:
    • Euclidean: Computes the standard Euclidean distance.
    • Cosine / DotProduct: Computes the dot product (with normalization applied for Cosine).
    • HNSW: Uses a graph-based approach for approximate nearest neighbor search.
    • Binary: Compresses the query vector into a binary signature and computes the Hamming distance between this signature and those of all stored embeddings.
  3. For every embedding in the collection, it calculates a “score” between the stored vector (or its compressed representation) and the query.
  4. The results are sorted:
    • For Euclidean distance, lower scores (closer to zero) are better.
    • For Cosine/DotProduct, higher scores are considered more similar.
    • For Binary, a lower Hamming distance means the vectors are more similar.
  5. Finally, the top‑k results are returned as a list of tuples (embedding_id, score).
Technique/AlgorithmMeasuresMagnitude Sensitive?¹Scale Invariant?¹Best Use CasesProsCons
Euclidean DistanceStraight-line distanceYesNoDense data where both magnitude & direction are importantIntuitive, widely used, captures magnitude differencesSensitive to scale differences, high dimensionality issues
Cosine SimilarityDirectional similarity (angle)NoYesText or high-dimensional data where scale invariance is desiredInsensitive to magnitude, works well with normalized vectorsIgnores magnitude differences
Dot ProductCombination of direction & magnitudeYesNoApplications where both direction & magnitude matterComputationally efficient, captures both aspectsSensitive to vector magnitudes
HNSW IndexingGraph-based Approximate Nearest Neighbor SearchDependent on MetricDependent on MetricLarge datasets, real-time search when approximate results are acceptableSublinear search time, good speed-accuracy trade-off, scalableApproximate results, index build time and memory overhead
Binary (Hamming)Fast binary signature comparison using Hamming distanceNoYesApplications requiring ultra‑fast approximate searches on large-scale dataExtremely fast comparison via bit-level operations, low memory footprintLoses precision due to compression, less suited when exact distances are needed

performance-notes

Performance Notes

  • HNSW can speed up searches significantly for large datasets but comes with higher memory usage for the index.
  • Binary distance uses bit-level compression and Hamming distance for extremely fast approximate similarity checks (especially beneficial for large or high-dimensional vectors, though it trades off some precision).
  • Cosine normalizes vectors once on insertion, so queries and stored embeddings use a straightforward dot product.
  • Dot Product directly multiplies corresponding elements.
  • Euclidean uses a SIMD approach (wide::f32x4) for partial vectorization.

contributing

Contributing

Contributions are welcome! Please open an issue or submit a PR.

  1. Fork the repo
  2. Create a feature branch
  3. Add tests in test/
  4. Submit a PR

license

License

Apache 2.0 LICENSE