View Source cozodb (cozodb v0.2.1)
transactional, relational-graph-vector database with time travelling capability, perfect as the long-term memory for LLMs and AI.
This module implements the Erlang (BEAM) bindings for CozoDB via a NIF built using Rustler.
Key Features
- Transactional Operations: Supports atomic transactions for batch operations
- Hybrid Data Model: Combines relational, graph, and vector paradigms
- Time Travelling: Query historical versions of your data
- Flexible Indexing: with support for
- Covering Indices
- Proximity Indices:
- HNSW (Hierarchical Navigable Small World): Fast approximate nearest neighbor searches
- MinHash-LSH: Locality sensitive hashing for similarity searches
- Full-Text Search (FTS): Efficient text matching
Usage Overview
The API is divided into several sections:
Database Lifecycle
Opening/Closing: Use
open/0
,open/1
,open/2
, andopen/3
to create or open a database. Useclose/1
to gracefully shut down a database, freeing allocated NIF resources.Backup & Restore: Functions such as
backup/2
,restore/2
, andimport_from_backup/3
enable exporting and restoring your database.
Data Operations
Import/Export: The
import/2
andexport/2,3
functions allow for batch data ingestion and extraction, ensuring consistency via transactions.Running Scripts: Use
run/2
andrun/3
to execute CozoScript commands for querying or modifying the database.
System Catalogue
Relations and Columns: Functions like
relations/1
,columns/2
, anddescribe/3
let you inspect and manage stored relations.Indices and Triggers: Create indices using
create_index/4
and manage triggers viatriggers/2
,set_triggers/3
, anddelete_triggers/2
.
Advanced Features
Query Explanation: The
explain/2
function provides insights into query execution.Monitoring & Maintenance: Functions such as
running/1
,kill/2
, andcompact/1
help monitor and maintain database health.
Indexing Examples
Datalog Programs
See the CozoScript Tutorial.
HSNW
{ok, _} = Module:create_index(Db, "table_hnsw_fun", "my_hsnw_index", #{
type => hnsw,
dim => 128,
m => 50,
ef_construction => 20,
dtype => f32,
distance => l2,
fields => [v],
filter => <<"k != `foo`">>,
extend_candidates => false,
keep_pruned_connections => false
}).
LSH Indices
{ok, _} = Module:create_index(Db, "table_lsh_fun", "my_lsh_index", #{
type => lsh,
extractor => v,
extract_filter => "!is_null(v)",
tokenizer => simple,
filters => [alphanumonly],
n_perm => 200,
target_threshold => 0.7,
n_gram => 3,
false_positive_weight => 1.0,
false_negative_weight => 1.0
}).
FTS Indices
You can create an FTS index using create_index/4
.
The following example creates and index called my_fts_index
on the relation
rel_a
.
{ok, _} = Module:create_index(Db, "rel_a", "my_fts_index", #{
type => fts,
extractor => v,
extract_filter => "!is_null(v)",
tokenizer => simple,
filters => [alphanumonly]
}).
You can always use Cozo Script directly via run/2
. For example, the
following script is equivalent to the previous example.
::fts create rel_a:my_fts_index {
extractor: v,
extract_filter: !is_null(v),
tokenizer: Simple,
filters: [],
}
For more details, see the <a href="https://docs.cozodb.org/en/latest/vector.html#full-text-search-fts" target="_">FTS documentation</a>.
System Operations
- Query Management: Monitor running queries with running/1 and terminate problematic queries using kill/2.
Summary
Functions
Exports the database to a SQLite file at Path
.
To restore the database using this file see {@link restore/2}.
Closes the database.
Notice that the call is asyncronous and the database might take a while to
close and a subsequent invocation to {@link open/3} with the same path
might fail.
List columns for relation
Create index for relation
Calls set_triggers/3
with and empty specs list.
Create index for relation
Drop index with fully qualified name.
Create index for relation
Export the specified relations in Relations
.
It is guaranteed that the exported data form a consistent snapshot of what
was stored in the database.
Returns a map with binary keys for the names of relations, and values as maps
containing the headers
and rows
of the relation.
Export the specified relations in Relations
.
It is guaranteed that the exported data form a consistent snapshot of what
was stored in the database.
Returns a map with binary keys for the names of relations, and values as maps
containing the headers
and rows
of the relation.
Import data into a database. The data are imported inside a transaction, so that either all imports are successful, or none are. If conflicts arise because of concurrent modification to the database, via either CosoScript queries or other imports, the transaction will fail. The relations to import into must exist beforehand, and the data given must match the schema defined. This API can be used to batch-put or remove data from several stored relations atomically. The data parameter can contain relation names such as "rel_a", or relation names prefixed by a minus sign such as "-rel_a". For the former case, every row given for the relation will be put into the database, i.e. upsert semantics. For the latter case, the corresponding rows are removed from the database, and you should only specify the key part of the rows. As for rm in CozoScript, it is not an error to remove non-existent rows. %%
List indices for relation
Closes the database.
Notice that the call is asyncronous and the database might take a while to
close and a subsequent invocation to {@link open/3} with the same path
might fail.
Kill the running query associated with identifier Id
.
See running/1
to get the list of running queries and their
identifiers.
Opens a database with the default engine (aka backend).
Opens a database with the provided Engine (aka backend) in the /tmp
path.
Opens a database with the provided Engine (aka backend) and path.
Creates or opens an existing database.
List all existing relations.
Removes a relation
List columns for relation
Returns the CozoDB DBInstance as a NIF Resource. For testing and planned extensions, you SHALL NOT use this function.
Util function that takes a query_result() as argument and returns a list of rows as maps.
Returns the list of triggers.
Types
-type column_atomic_type() :: any | bool | bytes | json | int | float | string | uuid | validity.
-type column_composite_type() :: {list, column_atomic_type()} | {list, column_atomic_type(), Size :: pos_integer()} | {tuple, [column_atomic_type()]} | {vector, 32 | 64, Size :: pos_integer()}.
-type column_name() :: binary().
-type column_spec() :: undefined | #{type => column_type(), nullable => boolean(), default => binary()}.
-type column_type() :: column_atomic_type() | column_composite_type().
-type covering_index_spec() :: #{type := covering, fields := [column_name()]}.
-opaque db_handle()
-type engine() :: mem | sqlite | rocksdb.
-type engine_opts() :: map().
-type export_opts() :: #{encoding => json}.
-type extract_filter() :: string().
-type fts_index_spec() :: #{type := fts, extractor => column_name(), extract_filter => extract_filter(), tokenizer => tokenizer(), filters => [token_filter()]}.
-type hnsw_filter() :: string().
-type hnsw_index_spec() :: #{type := hnsw, dim := pos_integer(), m := pos_integer(), ef_construction := pos_integer(), fields := [column_name()], dtype => f32 | f64, distance => l2 | cosine | ip, filter => hnsw_filter(), extend_candidates => boolean(), keep_pruned_connections => boolean()}.
-type index_spec() :: covering_index_spec() | hnsw_index_spec() | lsh_index_spec() | fts_index_spec().
-type json() :: {json, binary()}.
-type lsh_index_spec() :: #{type := lsh, extractor := column_name(), tokenizer := tokenizer(), n_perm := pos_integer(), n_gram := pos_integer(), target_threshold := float(), extract_filter => extract_filter(), filters => [token_filter()], false_positive_weight => float(), false_negative_weight => float()}.
-type path() :: file:filename() | binary().
-type query_result() :: #{headers := [column_name()], rows := [row()], count := integer(), next => [row()] | null, took => float()}.
-type query_return() :: {ok, query_result()} | {ok, Json :: binary()} | {error, Reason :: any()}.
-type relation_name() :: binary().
-type relation_spec() :: binary() | #{keys => [column_name() | {column_name(), column_spec()}], columns => [column_name() | {column_name(), column_spec()}]}.
-type relations() :: #{relation_name() => #{headers => [binary()], rows => [row()]}}.
-type row() :: [value()].
-type tokenizer() :: raw | simple | whitespace | ngram | {ngram, MinGram :: pos_integer(), MaxGram :: pos_integer(), PrefixOnly :: boolean()} | {cangjie, default | all | search | unicode}.
-type trigger_event() :: on_put | on_remove | on_replace.
-type trigger_spec() :: #{trigger_event() => script()}.
Functions
Exports the database to a SQLite file at Path
.
To restore the database using this file see {@link restore/2}.
Closes the database.
Notice that the call is asyncronous and the database might take a while to
close and a subsequent invocation to {@link open/3} with the same path
might fail.
-spec columns(DbHandle :: db_handle(), RelName :: binary() | list()) -> query_return().
List columns for relation
-spec create_index(DbHandle :: db_handle(), RelName :: binary() | list(), Name :: binary() | list(), Spec :: index_spec()) -> ok | {error, Reason :: any()} | no_return().
Create index for relation
Hierarchical Navigable Small World (HNSW) Index
The parameters are:
- The dimension
dim
and the data typedtype
(defaults toF32
) has to match the dimensions of any vector you index - The fields parameter is a list of fields in the table that should be indexed
- The indexed fields must only contain vectors of the same dimension and data type, or null, or a list of vectors of the same dimension and data type
- The distance parameter is the distance metric to use: the options are L2 ( default), Cosine and IP
- The m controls the maximal number of outgoing connections from each node in the graph
- The ef_construction parameter is the number of nearest neighbors to use when building the index: see the HNSW paper for details
- The filter parameter, when given, is bound to the fields of the original relation and only those rows for which the expression evaluates to true are indexed
- The extend_candidates parameter is a boolean (default false) that controls whether the index should extend the candidate list with the nearest neighbors of the nearest neighbors
- The keep_pruned_connections parameter is a boolean (default false) that controls whether the index should keep pruned connections.
Example
1> Spec = #{
type => hnsw,
dim => 128,
m => 50,
ef_construction => 20,
dtype => f32,
distance => l2,
fields => [v],
filter => <<"k != `foo`">>,
extend_candidates => false,
keep_pruned_connections => false
}.
2> create_index(Db, my_relation, Spec).
ok
-spec delete_triggers(DbHandle :: db_handle(), RelName :: binary() | list()) -> query_return().
Calls set_triggers/3
with and empty specs list.
-spec describe(DbHandle :: db_handle(), RelName :: binary() | list(), Desc :: binary() | list()) -> query_return().
Create index for relation
-spec drop_index(DbHandle :: db_handle(), FQN :: binary() | list()) -> ok | {error, Reason :: any()} | no_return().
Drop index with fully qualified name.
-spec drop_index(DbHandle :: db_handle(), RelName :: binary() | list(), Name :: binary() | list()) -> query_return().
Create index for relation
-spec explain(DbHandle :: db_handle(), Query :: binary() | list()) -> query_return().
-spec export(DbHandle :: db_handle(), RelNames :: [relation_name()] | binary()) -> ok | {error, Reason :: any()}.
Export the specified relations in Relations
.
It is guaranteed that the exported data form a consistent snapshot of what
was stored in the database.
Returns a map with binary keys for the names of relations, and values as maps
containing the headers
and rows
of the relation.
-spec export(DbHandle :: db_handle(), RelNames :: [relation_name()] | binary(), Opts :: export_opts()) -> {ok, relations() | binary()} | {error, Reason :: any()}.
Export the specified relations in Relations
.
It is guaranteed that the exported data form a consistent snapshot of what
was stored in the database.
Returns a map with binary keys for the names of relations, and values as maps
containing the headers
and rows
of the relation.
-spec import(DbHandle :: db_handle(), Relations :: iodata() | relations()) -> ok | {error, Reason :: any()}.
Import data into a database. The data are imported inside a transaction, so that either all imports are successful, or none are. If conflicts arise because of concurrent modification to the database, via either CosoScript queries or other imports, the transaction will fail. The relations to import into must exist beforehand, and the data given must match the schema defined. This API can be used to batch-put or remove data from several stored relations atomically. The data parameter can contain relation names such as "rel_a", or relation names prefixed by a minus sign such as "-rel_a". For the former case, every row given for the relation will be put into the database, i.e. upsert semantics. For the latter case, the corresponding rows are removed from the database, and you should only specify the key part of the rows. As for rm in CozoScript, it is not an error to remove non-existent rows. %%
Erlang Example ===
#{
rel_a => #{
headers => ["x", "y"],
rows => [[1, 2], [3, 4]]
},
rel_b => #{
headers => ["z"],
rows => []
}
}
-spec indices(DbHandle :: db_handle(), RelName :: binary() | list()) -> query_return().
List indices for relation
Closes the database.
Notice that the call is asyncronous and the database might take a while to
close and a subsequent invocation to {@link open/3} with the same path
might fail.
-spec kill(DbHandle :: db_handle(), Id :: binary()) -> query_result().
Kill the running query associated with identifier Id
.
See running/1
to get the list of running queries and their
identifiers.
Opens a database with the default engine (aka backend).
Opens a database with the provided Engine (aka backend) in the /tmp
path.
-spec open(Engine :: engine(), Path :: path()) -> {ok, db_handle()} | {error, Reason :: any()} | no_return().
Opens a database with the provided Engine (aka backend) and path.
-spec open(Engine :: engine(), Path :: path(), Opts :: engine_opts()) -> {ok, db_handle()} | {error, Reason :: any()} | no_return().
Creates or opens an existing database.
The database has to be explicitely closed using close/1
for Erlang
to release the allocated ErlNIF resources.
Path
is ignored whenEngine
ismem
Opts
apply only totikv
engine
RocksDB
To define options for RocksDB you should make sure a RocksDB configuration file
named config
is present at Path
before you call this function.
-spec relations(DbHandle :: db_handle()) -> query_return().
List all existing relations.
-spec remove_relation(DbHandle :: db_handle(), RelName :: binary() | string()) -> query_return().
Removes a relation
-spec remove_relations(DbHandle :: db_handle(), RelNames :: [binary()]) -> ok | {error, Reason :: any()}.
List columns for relation
Returns the CozoDB DBInstance as a NIF Resource. For testing and planned extensions, you SHALL NOT use this function.
-spec rows_to_maps(query_result()) -> map().
Util function that takes a query_result() as argument and returns a list of rows as maps.
-spec run(DbHandle :: db_handle(), Script :: script()) -> query_return() | no_return().
-spec run(DbHandle :: db_handle(), Script :: list() | binary(), Opts :: query_opts()) -> query_return().
-spec running(DbHandle :: db_handle()) -> query_result().
-spec set_triggers(DbHandle :: db_handle(), RelName :: binary() | list(), Specs :: [trigger_spec()]) -> query_return().
-spec triggers(DbHandle :: db_handle(), RelName :: binary() | list()) -> query_return().
Returns the list of triggers.