View Source cozodb (cozodb v0.2.1)

CozoDB

transactional, relational-graph-vector database with time travelling capability, perfect as the long-term memory for LLMs and AI.

This module implements the Erlang (BEAM) bindings for CozoDB via a NIF built using Rustler.

Key Features

  • Transactional Operations: Supports atomic transactions for batch operations
  • Hybrid Data Model: Combines relational, graph, and vector paradigms
  • Time Travelling: Query historical versions of your data
  • Flexible Indexing: with support for
    • Covering Indices
    • Proximity Indices:
      • HNSW (Hierarchical Navigable Small World): Fast approximate nearest neighbor searches
      • MinHash-LSH: Locality sensitive hashing for similarity searches
      • Full-Text Search (FTS): Efficient text matching

Usage Overview

The API is divided into several sections:

Database Lifecycle

Data Operations

  • Import/Export: The import/2 and export/2,3 functions allow for batch data ingestion and extraction, ensuring consistency via transactions.

  • Running Scripts: Use run/2 and run/3 to execute CozoScript commands for querying or modifying the database.

System Catalogue

Advanced Features

  • Query Explanation: The explain/2 function provides insights into query execution.

  • Monitoring & Maintenance: Functions such as running/1, kill/2, and compact/1 help monitor and maintain database health.

Indexing Examples

Datalog Programs

See the CozoScript Tutorial.

HSNW

{ok, _} = Module:create_index(Db, "table_hnsw_fun", "my_hsnw_index", #{
    type => hnsw,
    dim => 128,
    m => 50,
    ef_construction => 20,
    dtype => f32,
    distance => l2,
    fields => [v],
    filter => <<"k != `foo`">>,
    extend_candidates => false,
    keep_pruned_connections => false
}).

LSH Indices

{ok, _} = Module:create_index(Db, "table_lsh_fun", "my_lsh_index", #{
    type => lsh,
    extractor => v,
    extract_filter => "!is_null(v)",
    tokenizer => simple,
    filters => [alphanumonly],
    n_perm => 200,
    target_threshold => 0.7,
    n_gram => 3,
    false_positive_weight => 1.0,
    false_negative_weight => 1.0
}).

FTS Indices

You can create an FTS index using create_index/4.

The following example creates and index called my_fts_index on the relation rel_a.

{ok, _} = Module:create_index(Db, "rel_a", "my_fts_index", #{
    type => fts,
    extractor => v,
    extract_filter => "!is_null(v)",
    tokenizer => simple,
    filters => [alphanumonly]
}).

You can always use Cozo Script directly via run/2. For example, the following script is equivalent to the previous example.

::fts create rel_a:my_fts_index {
    extractor: v,
    extract_filter: !is_null(v),
    tokenizer: Simple,
    filters: [],
}

For more details, see the <a href="https://docs.cozodb.org/en/latest/vector.html#full-text-search-fts" target="_">FTS documentation</a>.

System Operations

  • Query Management: Monitor running queries with running/1 and terminate problematic queries using kill/2.

Summary

Functions

Exports the database to a SQLite file at Path. To restore the database using this file see {@link restore/2}.

Closes the database. Notice that the call is asyncronous and the database might take a while to close and a subsequent invocation to {@link open/3} with the same path might fail.

List columns for relation

Calls set_triggers/3 with and empty specs list.

Create index for relation

Drop index with fully qualified name.

Create index for relation

Export the specified relations in Relations. It is guaranteed that the exported data form a consistent snapshot of what was stored in the database. Returns a map with binary keys for the names of relations, and values as maps containing the headers and rows of the relation.

Export the specified relations in Relations. It is guaranteed that the exported data form a consistent snapshot of what was stored in the database. Returns a map with binary keys for the names of relations, and values as maps containing the headers and rows of the relation.

Import data into a database. The data are imported inside a transaction, so that either all imports are successful, or none are. If conflicts arise because of concurrent modification to the database, via either CosoScript queries or other imports, the transaction will fail. The relations to import into must exist beforehand, and the data given must match the schema defined. This API can be used to batch-put or remove data from several stored relations atomically. The data parameter can contain relation names such as "rel_a", or relation names prefixed by a minus sign such as "-rel_a". For the former case, every row given for the relation will be put into the database, i.e. upsert semantics. For the latter case, the corresponding rows are removed from the database, and you should only specify the key part of the rows. As for rm in CozoScript, it is not an error to remove non-existent rows. %%

List indices for relation

Closes the database. Notice that the call is asyncronous and the database might take a while to close and a subsequent invocation to {@link open/3} with the same path might fail.

Kill the running query associated with identifier Id. See running/1 to get the list of running queries and their identifiers.

Opens a database with the default engine (aka backend).

Opens a database with the provided Engine (aka backend) in the /tmp path.

Opens a database with the provided Engine (aka backend) and path.

Creates or opens an existing database.

List all existing relations.

List columns for relation

Returns the CozoDB DBInstance as a NIF Resource. For testing and planned extensions, you SHALL NOT use this function.

Util function that takes a query_result() as argument and returns a list of rows as maps.

Returns the list of triggers.

Types

column_atomic_type()

-type column_atomic_type() :: any | bool | bytes | json | int | float | string | uuid | validity.

column_composite_type()

-type column_composite_type() ::
          {list, column_atomic_type()} |
          {list, column_atomic_type(), Size :: pos_integer()} |
          {tuple, [column_atomic_type()]} |
          {vector, 32 | 64, Size :: pos_integer()}.

column_name()

-type column_name() :: binary().

column_spec()

-type column_spec() :: undefined | #{type => column_type(), nullable => boolean(), default => binary()}.

column_type()

-type column_type() :: column_atomic_type() | column_composite_type().

covering_index_spec()

-type covering_index_spec() :: #{type := covering, fields := [column_name()]}.

db_handle()

-opaque db_handle()

engine()

-type engine() :: mem | sqlite | rocksdb.

engine_opts()

-type engine_opts() :: map().

export_opts()

-type export_opts() :: #{encoding => json}.

extract_filter()

-type extract_filter() :: string().

fts_index_spec()

-type fts_index_spec() ::
          #{type := fts,
            extractor => column_name(),
            extract_filter => extract_filter(),
            tokenizer => tokenizer(),
            filters => [token_filter()]}.

hnsw_filter()

-type hnsw_filter() :: string().

hnsw_index_spec()

-type hnsw_index_spec() ::
          #{type := hnsw,
            dim := pos_integer(),
            m := pos_integer(),
            ef_construction := pos_integer(),
            fields := [column_name()],
            dtype => f32 | f64,
            distance => l2 | cosine | ip,
            filter => hnsw_filter(),
            extend_candidates => boolean(),
            keep_pruned_connections => boolean()}.

index_spec()

-type index_spec() :: covering_index_spec() | hnsw_index_spec() | lsh_index_spec() | fts_index_spec().

info()

-type info() :: #{engine := binary(), path := binary()}.

json()

-type json() :: {json, binary()}.

lsh_index_spec()

-type lsh_index_spec() ::
          #{type := lsh,
            extractor := column_name(),
            tokenizer := tokenizer(),
            n_perm := pos_integer(),
            n_gram := pos_integer(),
            target_threshold := float(),
            extract_filter => extract_filter(),
            filters => [token_filter()],
            false_positive_weight => float(),
            false_negative_weight => float()}.

path()

-type path() :: file:filename() | binary().

query_opts()

-type query_opts() ::
          #{encoding => json | undefined,
            read_only => boolean(),
            parameters =>
                #{Key :: atom() | binary() => Value :: any()} |
                [{Key :: atom() | binary(), Value :: any()}]}.

query_result()

-type query_result() ::
          #{headers := [column_name()],
            rows := [row()],
            count := integer(),
            next => [row()] | null,
            took => float()}.

query_return()

-type query_return() :: {ok, query_result()} | {ok, Json :: binary()} | {error, Reason :: any()}.

relation_name()

-type relation_name() :: binary().

relation_spec()

-type relation_spec() ::
          binary() |
          #{keys => [column_name() | {column_name(), column_spec()}],
            columns => [column_name() | {column_name(), column_spec()}]}.

relations()

-type relations() :: #{relation_name() => #{headers => [binary()], rows => [row()]}}.

row()

-type row() :: [value()].

script()

-type script() :: list() | binary().

token_filter()

-type token_filter() ::
          lowercase | alphanumonly | asciifolding |
          {stemmer, Lang :: string()} |
          {stopwords, Lang :: string()}.

tokenizer()

-type tokenizer() ::
          raw | simple | whitespace | ngram |
          {ngram, MinGram :: pos_integer(), MaxGram :: pos_integer(), PrefixOnly :: boolean()} |
          {cangjie, default | all | search | unicode}.

trigger_event()

-type trigger_event() :: on_put | on_remove | on_replace.

trigger_spec()

-type trigger_spec() :: #{trigger_event() => script()}.

validity()

-type validity() :: {float(), boolean()}.

value()

-type value() :: null | boolean() | integer() | float() | list() | binary() | validity() | json().

Functions

backup(DbHandle, Path)

-spec backup(DbHandle :: db_handle(), Path :: path()) -> ok | {error, Reason :: any()}.

Exports the database to a SQLite file at Path. To restore the database using this file see {@link restore/2}.

close(DbHandle)

-spec close(DbHandle :: db_handle()) -> ok | {error, Reason :: any()}.

Closes the database. Notice that the call is asyncronous and the database might take a while to close and a subsequent invocation to {@link open/3} with the same path might fail.

columns(DbHandle, RelName)

-spec columns(DbHandle :: db_handle(), RelName :: binary() | list()) -> query_return().

List columns for relation

compact(DbHandle)

-spec compact(DbHandle :: db_handle()) -> ok | {error, Reason :: any()}.

create_index(DbHandle, RelName, Name, Spec)

-spec create_index(DbHandle :: db_handle(),
                   RelName :: binary() | list(),
                   Name :: binary() | list(),
                   Spec :: index_spec()) ->
                      ok | {error, Reason :: any()} | no_return().

Create index for relation

Hierarchical Navigable Small World (HNSW) Index

The parameters are:

  • The dimension dim and the data type dtype (defaults to F32) has to match the dimensions of any vector you index
  • The fields parameter is a list of fields in the table that should be indexed
  • The indexed fields must only contain vectors of the same dimension and data type, or null, or a list of vectors of the same dimension and data type
  • The distance parameter is the distance metric to use: the options are L2 ( default), Cosine and IP
  • The m controls the maximal number of outgoing connections from each node in the graph
  • The ef_construction parameter is the number of nearest neighbors to use when building the index: see the HNSW paper for details
  • The filter parameter, when given, is bound to the fields of the original relation and only those rows for which the expression evaluates to true are indexed
  • The extend_candidates parameter is a boolean (default false) that controls whether the index should extend the candidate list with the nearest neighbors of the nearest neighbors
  • The keep_pruned_connections parameter is a boolean (default false) that controls whether the index should keep pruned connections.

Example

1> Spec = #{
 type => hnsw,
 dim => 128,
 m => 50,
 ef_construction => 20,
 dtype => f32,
 distance => l2,
 fields => [v],
 filter => <<"k != `foo`">>,
 extend_candidates => false,
 keep_pruned_connections => false
}.
2> create_index(Db, my_relation, Spec).
ok

create_relation(DbHandle, RelName, Spec)

-spec create_relation(DbHandle :: db_handle(),
                      RelName :: atom() | binary() | list(),
                      Spec :: relation_spec()) ->
                         ok | {error, Reason :: any()} | no_return().

delete_triggers(DbHandle, RelName)

-spec delete_triggers(DbHandle :: db_handle(), RelName :: binary() | list()) -> query_return().

Calls set_triggers/3 with and empty specs list.

describe(DbHandle, RelName, Desc)

-spec describe(DbHandle :: db_handle(), RelName :: binary() | list(), Desc :: binary() | list()) ->
                  query_return().

Create index for relation

drop_index(DbHandle, FQN)

-spec drop_index(DbHandle :: db_handle(), FQN :: binary() | list()) ->
                    ok | {error, Reason :: any()} | no_return().

Drop index with fully qualified name.

drop_index(DbHandle, RelName, Name)

-spec drop_index(DbHandle :: db_handle(), RelName :: binary() | list(), Name :: binary() | list()) ->
                    query_return().

Create index for relation

explain(DbHandle, Query)

-spec explain(DbHandle :: db_handle(), Query :: binary() | list()) -> query_return().

export(DbHandle, RelNames)

-spec export(DbHandle :: db_handle(), RelNames :: [relation_name()] | binary()) ->
                ok | {error, Reason :: any()}.

Export the specified relations in Relations. It is guaranteed that the exported data form a consistent snapshot of what was stored in the database. Returns a map with binary keys for the names of relations, and values as maps containing the headers and rows of the relation.

export(DbHandle, RelNames, Opts)

-spec export(DbHandle :: db_handle(), RelNames :: [relation_name()] | binary(), Opts :: export_opts()) ->
                {ok, relations() | binary()} | {error, Reason :: any()}.

Export the specified relations in Relations. It is guaranteed that the exported data form a consistent snapshot of what was stored in the database. Returns a map with binary keys for the names of relations, and values as maps containing the headers and rows of the relation.

import(DbHandle, Relations)

-spec import(DbHandle :: db_handle(), Relations :: iodata() | relations()) ->
                ok | {error, Reason :: any()}.

Import data into a database. The data are imported inside a transaction, so that either all imports are successful, or none are. If conflicts arise because of concurrent modification to the database, via either CosoScript queries or other imports, the transaction will fail. The relations to import into must exist beforehand, and the data given must match the schema defined. This API can be used to batch-put or remove data from several stored relations atomically. The data parameter can contain relation names such as "rel_a", or relation names prefixed by a minus sign such as "-rel_a". For the former case, every row given for the relation will be put into the database, i.e. upsert semantics. For the latter case, the corresponding rows are removed from the database, and you should only specify the key part of the rows. As for rm in CozoScript, it is not an error to remove non-existent rows. %%

Erlang Example ===

#{
   rel_a => #{
       headers => ["x", "y"],
       rows => [[1, 2], [3, 4]]
   },
   rel_b => #{
       headers => ["z"],
       rows => []
   }
}

import_from_backup(DbHandle, Path, Relations)

-spec import_from_backup(DbHandle :: db_handle(), Path :: path(), Relations :: []) ->
                            ok | {error, Reason :: any()}.

indices(DbHandle, RelName)

-spec indices(DbHandle :: db_handle(), RelName :: binary() | list()) -> query_return().

List indices for relation

info(DbHandle)

-spec info(DbHandle :: db_handle()) -> info().

Closes the database. Notice that the call is asyncronous and the database might take a while to close and a subsequent invocation to {@link open/3} with the same path might fail.

kill(DbHandle, Id)

-spec kill(DbHandle :: db_handle(), Id :: binary()) -> query_result().

Kill the running query associated with identifier Id. See running/1 to get the list of running queries and their identifiers.

open()

-spec open() -> {ok, db_handle()} | {error, Reason :: any()} | no_return().

Opens a database with the default engine (aka backend).

open(Engine)

-spec open(Engine :: engine()) -> {ok, db_handle()} | {error, Reason :: any()} | no_return().

Opens a database with the provided Engine (aka backend) in the /tmp path.

open(Engine, Path)

-spec open(Engine :: engine(), Path :: path()) ->
              {ok, db_handle()} | {error, Reason :: any()} | no_return().

Opens a database with the provided Engine (aka backend) and path.

open(Engine, Path, Opts)

-spec open(Engine :: engine(), Path :: path(), Opts :: engine_opts()) ->
              {ok, db_handle()} | {error, Reason :: any()} | no_return().

Creates or opens an existing database.

The database has to be explicitely closed using close/1 for Erlang to release the allocated ErlNIF resources.

  • Path is ignored when Engine is mem
  • Opts apply only to tikv engine

RocksDB

To define options for RocksDB you should make sure a RocksDB configuration file named config is present at Path before you call this function.

register_callback(DbHandle, RelName)

-spec register_callback(DbHandle :: db_handle(), RelName :: binary()) -> ok.

relations(DbHandle)

-spec relations(DbHandle :: db_handle()) -> query_return().

List all existing relations.

remove_relation(DbHandle, RelName)

-spec remove_relation(DbHandle :: db_handle(), RelName :: binary() | string()) -> query_return().

Removes a relation

remove_relations(DbHandle, RelNames)

-spec remove_relations(DbHandle :: db_handle(), RelNames :: [binary()]) -> ok | {error, Reason :: any()}.

List columns for relation

resource(DbHandle)

-spec resource(DbHandle :: db_handle()) -> {ok, reference()} | {error, any()}.

Returns the CozoDB DBInstance as a NIF Resource. For testing and planned extensions, you SHALL NOT use this function.

restore(DbHandle, Path)

-spec restore(DbHandle :: db_handle(), Path :: path()) -> ok | {error, Reason :: any()}.

rows_to_maps/1

-spec rows_to_maps(query_result()) -> map().

Util function that takes a query_result() as argument and returns a list of rows as maps.

run(DbHandle, Script)

-spec run(DbHandle :: db_handle(), Script :: script()) -> query_return() | no_return().

run(DbHandle, Script, Opts)

-spec run(DbHandle :: db_handle(), Script :: list() | binary(), Opts :: query_opts()) -> query_return().

running(DbHandle)

-spec running(DbHandle :: db_handle()) -> query_result().

set_triggers(DbHandle, RelName, Specs)

-spec set_triggers(DbHandle :: db_handle(), RelName :: binary() | list(), Specs :: [trigger_spec()]) ->
                      query_return().

triggers(DbHandle, RelName)

-spec triggers(DbHandle :: db_handle(), RelName :: binary() | list()) -> query_return().

Returns the list of triggers.

unregister_callback(DbHandle, Id)

-spec unregister_callback(DbHandle :: db_handle(), Id :: integer()) -> boolean().