View Source Vettore (Vettore v0.1.11)
The Vettore library is designed for fast, in-memory operations on vector (embedding) data.
All vectors (embeddings) are stored in a Rust data structure (a HashMap
), accessed via a shared resource
(using Rustler’s ResourceArc
with a Mutex
). Core operations include:
Creating a collection: A named set of embeddings with a fixed dimension and a chosen similarity metric (
"hnsw"
,"binary"
,"euclidean"
,"cosine"
, or"dot"
).Inserting an embedding: Add a new embedding (with ID, vector, and optional metadata) to a specific collection.
Retrieving embeddings: Fetch all embeddings from a collection or look up a single embedding by its unique ID.
Similarity search: Given a query vector, calculate a “score” for every embedding in the collection and return the top‑k results (e.g. the smallest distances or largest similarities).
usage-example
Usage Example
db = Vettore.new_db()
:ok = Vettore.create_collection(db, "my_collection", 3, "euclidean")
# Insert an embedding via struct:
embedding = %Vettore.Embedding{id: "my_id", vector: [1.0, 2.0, 3.0], metadata: %{"note" => "hello"}}
:ok = Vettore.insert_embedding(db, "my_collection", embedding)
# Retrieve it back:
{:ok, returned_emb} = Vettore.get_embedding_by_id(db, "my_collection", "my_id")
IO.inspect(returned_emb.vector, label: "Retrieved vector")
# Perform a similarity search:
{:ok, top_results} = Vettore.similarity_search(db, "my_collection", [1.5, 1.5, 1.5], 2)
IO.inspect(top_results, label: "Top K search results")
Link to this section Summary
Functions
Creates a new collection in the database with the given name
, dimension
, and distance
metric.
Deletes a collection by its name
.
Deletes a single embedding by its id
.
Looks up a single embedding by its id
, within the given collection
.
Returns all embeddings from a given collection.
Inserts a single embedding (as a %Vettore.Embedding{}
struct) into a specified collection
.
Batch-inserts a list of embeddings (each a %Vettore.Embedding{}
) into the specified collection
.
Re-rank a list of {id, score}
results using Maximal Marginal Relevance (MMR).
Returns a new database resource (wrapped in a Rustler ResourceArc
).
Similarity search with optional limit
(defaults to 10) and optional filter
map.
Link to this section Functions
create_collection(db, collection_name, dimension, distance, opts \\ [])
View Source@spec create_collection( db :: any(), collection_name :: String.t(), dimension :: integer(), distance :: String.t(), opts :: keyword() ) :: {:ok, String.t()} | {:error, String.t()}
Creates a new collection in the database with the given name
, dimension
, and distance
metric.
db
is the database resource (created withnew_db/0
).name
is the name of the collection.dimension
is the number of dimensions in the vector.distance
can be one of:"euclidean"
,"cosine"
,"dot"
,"hnsw"
, or"binary"
.
Returns {:ok, name}
on success, or {:error, reason}
if the collection already exists or if the distance is invalid.
examples
Examples
= Vettore.create_collection(db, "my_collection", 3, "euclidean")
@spec delete_collection(db :: any(), collection_name :: String.t()) :: {:ok, String.t()} | {:error, String.t()}
Deletes a collection by its name
.
db
is the database resource (created withnew_db/0
).name
is the name of the collection.
Returns {:ok, name}
if the collection was found and deleted, or {:error, reason}
otherwise.
examples
Examples
{:ok, "my_collection"} = Vettore.delete_collection(db, "my_collection")
@spec delete_embedding_by_id( db :: any(), collection_name :: String.t(), id :: String.t() ) :: {:ok, String.t()} | {:error, String.t()}
Deletes a single embedding by its id
.
db
is the database resource (created withnew_db/0
).collection
is the name of the collection.id
is the ID of the embedding.
Returns {:ok, id}
if the embedding was found and deleted, or {:error, reason}
otherwise.
examples
Examples
Vettore.delete_embedding_by_id(db, "my_collection", "my_id")
# => {:ok, "my_id"}
@spec get_embedding_by_id( db :: any(), collection_name :: String.t(), id :: String.t() ) :: {:ok, Vettore.Embedding.t()} | {:error, String.t()}
Looks up a single embedding by its id
, within the given collection
.
If found, returns {:ok, %Vettore.Embedding{}}
.
If not found, returns {:error, reason}
.
examples
Examples
Vettore.get_embedding_by_id(db, "my_collection", "my_id")
# => {:ok, %Vettore.Embedding{id: "my_id", vector: [1.0, 2.0, 3.0], metadata: %{foo: "bar"}}}
@spec get_embeddings(db :: any(), collection_name :: String.t()) :: {:ok, [{String.t(), [float()], map() | nil}]} | {:error, String.t()}
Returns all embeddings from a given collection.
Each embedding is returned as (id, vector, metadata)
in a list—if you want to convert each
item into a %Vettore.Embedding{}
, you can do so manually or provide a helper function.
examples
Examples
Vettore.get_embeddings(db, "my_collection")
# => {:ok, [
# {"emb1", [1.0, 2.0, 3.0], %{"info" => "test"}},
# {"emb2", [3.14, 2.71, 1.62], nil},
# ]}
@spec insert_embedding( db :: any(), collection_name :: String.t(), embedding :: Vettore.Embedding.t() ) :: {:ok, String.t()} | {:error, String.t()}
Inserts a single embedding (as a %Vettore.Embedding{}
struct) into a specified collection
.
If the collection doesn't exist, you'll get {:error, "Collection '...' not found"}
.
If another embedding with the same :id
is already in the collection, you’ll get an error.
Also, if the :vector
length does not match the collection's configured dimension,
you’ll get a dimension mismatch error.
examples
Examples
embedding = %Vettore.Embedding{id: "my_id", vector: [1.0, 2.0, 3.0], metadata: %{foo: "bar"}}
{:ok, "my_id"} = Vettore.insert_embedding(db, "my_collection", embedding)
@spec insert_embeddings( db :: any(), collection_name :: String.t(), embeddings :: [Vettore.Embedding.t()] ) :: {:ok, [String.t()]} | {:error, String.t()}
Batch-inserts a list of embeddings (each a %Vettore.Embedding{}
) into the specified collection
.
It returns {:ok, ["id1", "id2", ...]}
if all embeddings inserted successfully.
If any embedding fails (e.g., dimension mismatch, ID conflict, or missing collection),
the function returns {:error, reason}
and stops immediately (does not insert the rest).
examples
Examples
embs = [
%Vettore.Embedding{id: "e1", vector: [1.0, 2.0, 3.0], metadata: nil},
%Vettore.Embedding{id: "e2", vector: [4.5, 6.7, 8.9], metadata: %{"info" => "test"}}
]
{:ok, ["e1", "e2"]} = Vettore.insert_embeddings(db, "my_collection", embs)
@spec mmr_rerank( db :: any(), collection_name :: String.t(), initial_results :: [{String.t(), number()}], opts :: keyword() ) :: {:ok, [{String.t(), number()}]} | {:error, String.t()}
Re-rank a list of {id, score}
results using Maximal Marginal Relevance (MMR).
Given a database resource db
, a collection
name, and an initial_results
list of
{id, score}
tuples (usually obtained from similarity_search/4
), this function applies
an MMR formula to select up to :limit
items that maximize both relevance (the score
)
and diversity among the selected items.
The alpha
parameter (0.0 to 1.0) balances relevance vs. redundancy:
alpha
close to1.0
→ prioritizes the raw score (similarity to query).alpha
close to0.0
→ heavily penalizes items similar to already-selected ones, thus promoting diversity.
We automatically convert Euclidean or Binary distance
into a “higher is better” similarity
by negating the distance (i.e. similarity = -distance
). For Cosine, Dot Product, or HNSW
approaches, the score
is already in a higher-is-better format.
Returns {:ok, [{id, mmr_score}, ...]}
on success or {:error, reason}
if the collection
is not found.
examples
Examples
After calling similarity_search/4
:
{:ok, initial_results} = Vettore.similarity_search(db, "my_collection", query_vec, limit: 50)
You can re-rank:
{:ok, mmr_list} =
Vettore.mmr_rerank(db, "my_collection", initial_results,
limit: 10,
alpha: 0.7
)
mmr_list
then gives a smaller set (up to 10 items) in MMR order, each with a new score
(mmr_score
) that reflects their final MMR weighting.
@spec new_db() :: any()
Returns a new database resource (wrapped in a Rustler ResourceArc
).
The database resource is a handle for the underlying Rust data structure.
examples
Examples
is_reference(db)
# => true
@spec similarity_search( db :: any(), collection_name :: String.t(), query :: [float()], opts :: keyword() ) :: {:ok, [{String.t(), float()}]} | {:error, String.t()}
Similarity search with optional limit
(defaults to 10) and optional filter
map.
performs a similarity (or distance) search in the given collection
using the provided query
vector, returning
the top-k results as a list of {embedding_id, score}
tuples.
- For
"euclidean"
, lower scores are better (distance). - For
"cosine"
, higher scores are better (dot product). - For
"dot"
, also higher is better. - For
"binary"
, the score is the Hamming distance—lower is more similar. - For
"hnsw"
, an approximate nearest neighbors is used.
Examples:
Vettore.similarity_search(db, "my_collection", [1.0, 2.0, 3.0], limit: 2, filter: %{"category" => "test"})
Vettore.similarity_search(db, "my_collection", [1.0, 2.0, 3.0], limit: 2)
Vettore.similarity_search(db, "my_collection", [1.0, 2.0, 3.0])
# => {:ok, [{"emb1", 0.0}, {"emb2", 1.23}]}