View Source cozodb (cozodb v0.2.0)
CozoDB is a A FOSS embeddable, transactional, relational-graph-vector database with time travelling capability, perfect as the long-term memory for LLMs and AI.
This library implements the CozoDB bindings for Erlang (BEAM) as a NIF using Rustler.
Datalog Programs
Stored Relations
Working with Indices
Covering Indices
Proximity Indices
These kinds of indices allow Cozo to perform fast searches for similar data. Cozo comes with three proximity index types:
- The Hierarchincal Navigable Small World (HNSW) index is a graph-based index that allows for fast approximate nearest neighbor searches
- The MinHash-LSH index is a locality sensitive hash index that allows for fast approximate nearest neighbor searches
- The Full-text Search (FTS) index allows for fast string matches.
HSNW
{ok, _} = Module:create_index(Db, "table_hnsw_fun", "my_hsnw_index", #{
type => hnsw,
dim => 128,
m => 50,
ef_construction => 20,
dtype => f32,
distance => l2,
fields => [v],
filter => <<"k != 'foo'">>,
extend_candidates => false,
keep_pruned_connections => false
}).
LSH Indices
{ok, _} = Module:create_index(Db, "table_lsh_fun", "my_lsh_index", #{
type => lsh,
extractor => v,
extract_filter => "!is_null(v)",
tokenizer => simple,
filters => [alphanumonly],
n_perm => 200,
target_threshold => 0.7,
n_gram => 3,
false_positive_weight => 1.0,
false_negative_weight => 1.0
}).
FTS Indices
You can create an FTS index using create_index/4
.
The following example creates and index called my_fts_index
on the relation rel_a
.
{ok, _} = Module:create_index(Db, "rel_a", "my_fts_index", #{
type => fts,
extractor => v,
extract_filter => "!is_null(v)",
tokenizer => simple,
filters => [alphanumonly]
}).
You can always use Cozo Script directly via run/2
. For example, the following script is equivalent to the previous example.
::fts create rel_a:my_fts_index {
extractor: v,
extract_filter: !is_null(v),
tokenizer: Simple,
filters: [],
}
Check the full FTS documentation.
System Operations
Summary
Functions
Exports the database to a SQLite file at Path
. To restore the database using this file see restore/2
.
Closes the database. Notice that the call is asyncronous and the database might take a while to close and a subsequent invocation to open/3
with the same path
might fail.
List columns for relation
Create index for relation
Calls set_triggers/3
with and empty specs list.
Create index for relation
Drop index with fully qualified name.
Create index for relation
Export the specified relations in Relations
. It is guaranteed that the exported data form a consistent snapshot of what was stored in the database. Returns a map with binary keys for the names of relations, and values as maps containing the headers
and rows
of the relation.
Export the specified relations in Relations
. It is guaranteed that the exported data form a consistent snapshot of what was stored in the database. Returns a map with binary keys for the names of relations, and values as maps containing the headers
and rows
of the relation.
Import data into a database. The data are imported inside a transaction, so that either all imports are successful, or none are. If conflicts arise because of concurrent modification to the database, via either CosoScript queries or other imports, the transaction will fail. The relations to import into must exist beforehand, and the data given must match the schema defined. This API can be used to batch-put or remove data from several stored relations atomically. The data parameter can contain relation names such as "rel_a", or relation names prefixed by a minus sign such as "-rel_a". For the former case, every row given for the relation will be put into the database, i.e. upsert semantics. For the latter case, the corresponding rows are removed from the database, and you should only specify the key part of the rows. As for rm in CozoScript, it is not an error to remove non-existent rows.
List indices for relation
Closes the database. Notice that the call is asyncronous and the database might take a while to close and a subsequent invocation to open/3
with the same path
might fail.
Kill the running query associated with identifier Id
. See running/1
to get the list of running queries and their identifiers.
Opens
Creates or opens an existing database. Path
is ignored when Engine
is mem
. The database has to be explicitely closed using close/1
for Erlang to release the allocated ErlNIF resources. Opts
is ignored for every engine except tikv
.
List all existing relations.
Removes a relation
List columns for relation
Returns the CozoDB DBInstance as a NIF Resource. For testing and planned extensions, you SHALL NOT use this function.
Util function that takes a query_result() as argument and returns a list of rows as maps.
Types
-type column_atomic_type() :: any | bool | bytes | json | int | float | string | uuid | validity.
-type column_composite_type() :: {list, column_atomic_type()} | {list, column_atomic_type(), Size :: pos_integer()} | {tuple, [column_atomic_type()]} | {vector, 32 | 64, Size :: pos_integer()}.
-type column_name() :: binary().
-type column_spec() :: undefined | #{type => column_type(), nullable => boolean(), default => binary()}.
-type column_type() :: column_atomic_type() | column_composite_type().
-type covering_index_spec() :: #{type := covering, fields := [column_name()]}.
-opaque db_handle()
-type engine() :: mem | sqlite | rocksdb.
-type engine_opts() :: map().
-type export_opts() :: #{encoding => json}.
-type extract_filter() :: string().
-type fts_index_spec() :: #{type := fts, extractor => column_name(), extract_filter => extract_filter(), tokenizer => tokenizer(), filters => [token_filter()]}.
-type hnsw_filter() :: string().
-type hnsw_index_spec() :: #{type := hnsw, dim := pos_integer(), m := pos_integer(), ef_construction := pos_integer(), fields := [column_name()], dtype => f32 | f64, distance => l2 | cosine | ip, filter => hnsw_filter(), extend_candidates => boolean(), keep_pruned_connections => boolean()}.
-type index_spec() :: covering_index_spec() | hnsw_index_spec() | lsh_index_spec() | fts_index_spec().
-type json() :: {json, binary()}.
-type lsh_index_spec() :: #{type := lsh, extractor := column_name(), tokenizer := tokenizer(), n_perm := pos_integer(), n_gram := pos_integer(), target_threshold := float(), extract_filter => extract_filter(), filters => [token_filter()], false_positive_weight => float(), false_negative_weight => float()}.
-type path() :: file:filename() | binary().
-type query_result() :: #{headers := [column_name()], rows := [row()], count := integer(), next => [row()] | null, took => float()}.
-type query_return() :: {ok, query_result()} | {ok, Json :: binary()} | {error, Reason :: any()}.
-type relation_name() :: binary().
-type relation_spec() :: binary() | #{keys => [column_name() | {column_name(), column_spec()}], columns => [column_name() | {column_name(), column_spec()}]}.
-type relations() :: #{relation_name() => #{headers => [binary()], rows => [row()]}}.
-type row() :: [value()].
-type tokenizer() :: raw | simple | whitespace | ngram | {ngram, MinGram :: pos_integer(), MaxGram :: pos_integer(), PrefixOnly :: boolean()} | {cangjie, default | all | search | unicode}.
-type trigger_event() :: on_put | on_remove | on_replace.
-type trigger_spec() :: #{trigger_event() => script()}.
Functions
Exports the database to a SQLite file at Path
. To restore the database using this file see restore/2
.
Closes the database. Notice that the call is asyncronous and the database might take a while to close and a subsequent invocation to open/3
with the same path
might fail.
-spec columns(DbHandle :: db_handle(), RelName :: binary() | list()) -> query_return().
List columns for relation
-spec create_index(DbHandle :: db_handle(), RelName :: binary() | list(), Name :: binary() | list(), Spec :: index_spec()) -> ok | {error, Reason :: any()} | no_return().
Create index for relation
=== Hierarchical Navigable Small World (HNSW) Index The parameters are:
- The dimension
dim
and the data typedtype
(defaults toF32
) has to match the dimensions of any vector you index. - The fields parameter is a list of fields in the table that should be indexed.
- The indexed fields must only contain vectors of the same dimension and data type, or null, or a list of vectors of the same dimension and data type.
- The distance parameter is the distance metric to use: the options are L2 ( default), Cosine and IP.
- The m controls the maximal number of outgoing connections from each node in the graph.
- The ef_construction parameter is the number of nearest neighbors to use when building the index: see the HNSW paper for details.
- The filter parameter, when given, is bound to the fields of the original relation and only those rows for which the expression evaluates to true are indexed.
- The extend_candidates parameter is a boolean (default false) that controls whether the index should extend the candidate list with the nearest neighbors of the nearest neighbors.
- The keep_pruned_connections parameter is a boolean (default false) that controls whether the index should keep pruned connections.
Example
1> Spec = #{
type => hnsw,
dim => 128,
m => 50,
ef_construction => 20,
dtype => f32,
distance => l2,
fields => [v],
filter => <<"k != 'foo'">>,
extend_candidates => false,
keep_pruned_connections => false
}.
2> create_index(Db, my_relation, Spec).
ok
-spec delete_triggers(DbHandle :: db_handle(), RelName :: binary() | list()) -> query_return().
Calls set_triggers/3
with and empty specs list.
-spec describe(DbHandle :: db_handle(), RelName :: binary() | list(), Desc :: binary() | list()) -> query_return().
Create index for relation
-spec drop_index(DbHandle :: db_handle(), FQN :: binary() | list()) -> ok | {error, Reason :: any()} | no_return().
Drop index with fully qualified name.
-spec drop_index(DbHandle :: db_handle(), RelName :: binary() | list(), Name :: binary() | list()) -> query_return().
Create index for relation
-spec explain(DbHandle :: db_handle(), Query :: binary() | list()) -> query_return().
-spec export(DbHandle :: db_handle(), RelNames :: [relation_name()] | binary()) -> ok | {error, Reason :: any()}.
Export the specified relations in Relations
. It is guaranteed that the exported data form a consistent snapshot of what was stored in the database. Returns a map with binary keys for the names of relations, and values as maps containing the headers
and rows
of the relation.
-spec export(DbHandle :: db_handle(), RelNames :: [relation_name()] | binary(), Opts :: export_opts()) -> {ok, relations() | binary()} | {error, Reason :: any()}.
Export the specified relations in Relations
. It is guaranteed that the exported data form a consistent snapshot of what was stored in the database. Returns a map with binary keys for the names of relations, and values as maps containing the headers
and rows
of the relation.
-spec import(DbHandle :: db_handle(), Relations :: iodata() | relations()) -> ok | {error, Reason :: any()}.
Import data into a database. The data are imported inside a transaction, so that either all imports are successful, or none are. If conflicts arise because of concurrent modification to the database, via either CosoScript queries or other imports, the transaction will fail. The relations to import into must exist beforehand, and the data given must match the schema defined. This API can be used to batch-put or remove data from several stored relations atomically. The data parameter can contain relation names such as "rel_a", or relation names prefixed by a minus sign such as "-rel_a". For the former case, every row given for the relation will be put into the database, i.e. upsert semantics. For the latter case, the corresponding rows are removed from the database, and you should only specify the key part of the rows. As for rm in CozoScript, it is not an error to remove non-existent rows.
Erlang Example
#{
rel_a => #{
headers => ["x", "y"],
rows => [[1, 2], [3, 4]]
},
rel_b => #{
headers => ["z"],
rows => []
}
}
-spec indices(DbHandle :: db_handle(), RelName :: binary() | list()) -> query_return().
List indices for relation
Closes the database. Notice that the call is asyncronous and the database might take a while to close and a subsequent invocation to open/3
with the same path
might fail.
-spec kill(DbHandle :: db_handle(), Id :: binary()) -> query_result().
Kill the running query associated with identifier Id
. See running/1
to get the list of running queries and their identifiers.
Opens
-spec open(Engine :: engine(), Path :: path(), Opts :: engine_opts()) -> {ok, db_handle()} | {error, Reason :: any()} | no_return().
Creates or opens an existing database. Path
is ignored when Engine
is mem
. The database has to be explicitely closed using close/1
for Erlang to release the allocated ErlNIF resources. Opts
is ignored for every engine except tikv
.
RocksDB
To define options for RocksDB you should make sure a file named "config" is present on Path
before you call this function.
-spec relations(DbHandle :: db_handle()) -> query_return().
List all existing relations.
-spec remove_relation(DbHandle :: db_handle(), RelName :: binary() | string()) -> query_return().
Removes a relation
-spec remove_relations(DbHandle :: db_handle(), RelNames :: [binary()]) -> ok | {error, Reason :: any()}.
List columns for relation
Returns the CozoDB DBInstance as a NIF Resource. For testing and planned extensions, you SHALL NOT use this function.
-spec rows_to_maps(query_result()) -> map().
Util function that takes a query_result() as argument and returns a list of rows as maps.
-spec run(DbHandle :: db_handle(), Script :: script()) -> query_return() | no_return().
-spec run(DbHandle :: db_handle(), Script :: list() | binary(), Opts :: query_opts()) -> query_return().
-spec running(DbHandle :: db_handle()) -> query_result().
-spec set_triggers(DbHandle :: db_handle(), RelName :: binary() | list(), Specs :: [trigger_spec()]) -> query_return().
-spec triggers(DbHandle :: db_handle(), RelName :: binary() | list()) -> query_return().