cubdb v0.3.0 CubDB View Source
CubDB
is an embedded key-value database written in the Elixir language. It
runs locally, and is backed by a single file.
Both keys and values can be any Elixir (or Erlang) term.
The CubDB
database file uses an immutable data structure that provides several
guarantees:
Robustness to data corruption, as entries are never changed in-place
Atomic writes: write operations either entirely succeed or entirely fail
Read operations run concurrently, and do not block nor are blocked by writes
Ranges of entries are selected on immutable snapshots, giving always a consistent view, even while write operations are being done concurrently
Usage
Start CubDB
by specifying a directory for its database file (if not existing,
it will be created):
{:ok, db} = CubDB.start_link("my/data/directory")
The get/2
, put/3
, and delete/2
functions work as you probably expect:
CubDB.put(db, :foo, "some value")
#=> :ok
CubDB.get(db, :foo)
#=> "some value"
CubDB.delete(db, :foo)
#=> :ok
CubDB.get(db, :foo)
#=> nil
Range of keys are retrieved using select/3
:
for {key, value} <- [a: 1, b: 2, c: 3, d: 4, e: 5, f: 6, g: 7, h: 8] do
CubDB.put(db, key, value)
end
CubDB.select(db, min_key: :b, max_key: :e)
#=> {:ok, [b: 2, c: 3, d: 4, e: 5]}
But select/3
can do much more than that. It can apply a pipeline of operations
(map
, filter
, take
, drop
and more) to the selected entries, it can
select the entries in normal or reverse order, and it can reduce
the result
using an arbitrary function:
# Take the sum of the last 3 even values:
CubDB.select(db,
reverse: true,
pipe: [
map: fn {_key, value} ->
value
end,
filter: fn value ->
is_integer(value) && Integer.is_even(value)
end,
take: 3
],
reduce: fn n, sum -> sum + n end
)
#=> {:ok, 18}
As CubDB
uses an immutable data structure, write operations cause the data
file to grow. Occasionally, it is adviseable to run a compaction to optimize
the file size and reclaim disk space. Compaction can be started manually by
calling compact/1
, and runs in the background, without blocking other
operations:
CubDB.compact(db)
#=> :ok
Alternatively, automatic compaction can be enabled, either passing an option
to start_link/3
, or by calling set_auto_compact/2
.
Link to this section Summary
Functions
Returns a specification to start this module under a supervisor.
Runs a database compaction.
Deletes the entry associated to key
from the database.
Returns the dirt factor.
Fetches the value for the given key
in the database, or return :error
if key
is not present.
Gets the value associated to key
from the database.
Returns whether an entry with the given key
exists in the database.
Writes an entry in the database, associating key
to value
.
Selects a range of entries from the database, and optionally performs a pipeline of operations on them.
Set whether to perform automatic compaction, and how.
Returns the number of entries present in the database.
Starts the CubDB
database without a link.
Starts the CubDB
database process linked to the current process.
Link to this section Types
key()
View Source
key() :: any()
key() :: any()
value()
View Source
value() :: any()
value() :: any()
Link to this section Functions
child_spec(init_arg) View Source
Returns a specification to start this module under a supervisor.
The default options listed in Supervisor
are used.
compact(db)
View Source
compact(GenServer.server()) :: :ok | {:error, binary()}
compact(GenServer.server()) :: :ok | {:error, binary()}
Runs a database compaction.
As write operations are performed on a database, its file grows. Occasionally, a compaction operation can be run to shrink the file to its optimal size. Compaction runs in the background and does not block operations.
Only one compaction operation can run at any time, therefore if this function
is called when a compaction is already running, it returns {:error, :pending_compaction}
.
When compacting, CubDB
will create a new data file, and eventually switch to
it and remove the old one as the compaction succeeds. For this reason, during
a compaction, there should be enough disk space for a second copy of the
database file.
delete(db, key)
View Source
delete(GenServer.server(), key()) :: :ok
delete(GenServer.server(), key()) :: :ok
Deletes the entry associated to key
from the database.
If key
was not present in the database, nothing is done.
dirt_factor(db)
View Source
dirt_factor(GenServer.server()) :: float()
dirt_factor(GenServer.server()) :: float()
Returns the dirt factor.
The dirt factor is a number, ranging from 0 to 1, giving an indication about the amount of overhead storage (or "dirt") that can be cleaned up with a compaction operation. A value of 0 means that there is no overhead, so a compaction would have no benefit. The closer to 1 the dirt factor is, the more can be cleaned up in a compaction operation.
fetch(db, key)
View Source
fetch(GenServer.server(), key()) :: {:ok, value()} | :error
fetch(GenServer.server(), key()) :: {:ok, value()} | :error
Fetches the value for the given key
in the database, or return :error
if key
is not present.
If the database contains an entry with the given key
and value value
, it
returns {:ok, value}
. If key
is not found, it returns :error
.
get(db, key, default \\ nil)
View Source
get(GenServer.server(), key(), value()) :: value()
get(GenServer.server(), key(), value()) :: value()
Gets the value associated to key
from the database.
If no value is associated with key
, default
is returned (which is nil
,
unless specified otherwise).
has_key?(db, key)
View Source
has_key?(GenServer.server(), key()) :: boolean()
has_key?(GenServer.server(), key()) :: boolean()
Returns whether an entry with the given key
exists in the database.
put(db, key, value)
View Source
put(GenServer.server(), key(), value()) :: :ok
put(GenServer.server(), key(), value()) :: :ok
Writes an entry in the database, associating key
to value
.
If key
was already present, it is overwritten.
select(db, options \\ [], timeout \\ 5000)
View Source
select(GenServer.server(), Keyword.t(), timeout()) ::
{:ok, any()} | {:error, Exception.t()}
select(GenServer.server(), Keyword.t(), timeout()) :: {:ok, any()} | {:error, Exception.t()}
Selects a range of entries from the database, and optionally performs a pipeline of operations on them.
It returns {:ok, result}
if successful, or {:error, exception}
if an
exception is raised.
Options
The min_key
and max_key
specify the range of entries that are selected. By
default, the range is inclusive, so all entries that have a key greater or
equal than min_key
and less or equal then max_key
are selected:
# Select all entries where `"a" <= key <= "d"`
CubDB.select(db, min_key: "b", max_key: "d")
The range boundaries can be excluded by setting min_key
or max_key
to
{key, :excluded}
:
# Select all entries where `"a" <= key < "d"`
CubDB.select(db, min_key: "b", max_key: {"d", :excluded})
Any of :min_key
and :max_key
can be omitted or set to nil
, to leave the
range open-ended.
# Select entries where `key <= "a"
CubDB.select(db, max_key: "a")
# Or, equivalently:
CubDB.select(db, min_key: nil, max_key: "a")
In case the key boundary is the literal value nil
, the longer form must be used:
# Select entries where `nil <= key <= "a"`
CubDB.select(db, min_key: {nil, :included}, max_key: "a")
The reverse
option, when set to true, causes the entries to be selected and
traversed in reverse order.
The pipe
option specifies an optional list of operations performed
sequentially on the selected entries. The given order of operations is
respected. The available operations, specified as tuples, are:
{:filter, fun}
filters entries for whichfun
returns a truthy value{:map, fun}
maps each entry to the value returned by the functionfun
{:take, n}
takes the firstn
entries{:drop, n}
skips the firstn
entries{:take_while, fun}
takes entries whilefun
returns a truthy value{:drop_while, fun}
skips entries whilefun
returns a truthy value
Note that, when selecting a key range, specifying min_key
and/or max_key
is more performant than using {:filter, fun}
or {:take_while | :drop_while, fun}
, because min_key
and max_key
avoid loading unnecessary entries from
disk entirely.
The reduce
option specifies how the selected entries are aggregated. If
reduce
is omitted, the entries are returned as a list. If reduce
is a
function, it is used to reduce the collection of entries. If reduce
is a
tuple, the first element is the starting value of the reduction, and the
second is the reducing function.
Examples
To select all entries with keys between :a
and :c
as a list of {key, value}
entries we can do:
{:ok, entries} = CubDB.select(db, min_key: :a, max_key: :c)
If we want to get all entries with keys between :a
and :c
, with :c
exluded, we can do:
{:ok, entries} = CubDB.select(db, min_key: :a, max_key: {:c, :excluded})
To select the last 3 entries, we can do:
{:ok, entries} = CubDB.select(db, reverse: true, pipe: [take: 3])
If we want to obtain the sum of the first 10 positive numeric values
associated to keys from :a
to :f
, we can do:
{:ok, sum} = CubDB.select(db,
min_key: :a,
max_key: :f,
pipe: [
map: fn {_key, value} -> value end, # map values
filter: fn n -> is_number(n) and n > 0 end # only positive numbers
take: 10, # take only the first 10 entries in the range
],
reduce: fn n, sum -> sum + n end # reduce to the sum of selected values
)
set_auto_compact(db, setting) View Source
Set whether to perform automatic compaction, and how.
If set to false
, no automatic compaction is performed. If set to true
,
auto-compaction is performed, following a write operation, if at least 100
write operations occurred since the last compaction, and the dirt factor is at
least 0.2. These values can be customized by setting the auto_compact
option
to {min_writes, min_dirt_factor}
.
It returns :ok
, or {:error, reason}
if setting
is invalid.
Compaction is done in the background and does not block other operations, but can create disk contention, so it should not be performed too often. When writing a lot into the database, such as when importing data from an external source, it is adviseable to turn off auto compaction, and manually run compaction at the end of the import.
size(db)
View Source
size(GenServer.server()) :: pos_integer()
size(GenServer.server()) :: pos_integer()
Returns the number of entries present in the database.
start(data_dir, options \\ [], gen_server_options \\ [])
View Source
start(binary(), Keyword.t(), GenServer.options()) :: GenServer.on_start()
start(binary(), Keyword.t(), GenServer.options()) :: GenServer.on_start()
Starts the CubDB
database without a link.
See start_link/2
for more informations.
start_link(data_dir, options \\ [], gen_server_options \\ [])
View Source
start_link(binary(), Keyword.t(), GenServer.options()) :: GenServer.on_start()
start_link(binary(), Keyword.t(), GenServer.options()) :: GenServer.on_start()
Starts the CubDB
database process linked to the current process.
The data_dir
argument is the directory path where the database files will be
stored. If it does not exist, it will be created. Only one CubDB
instance
can run per directory, so if you run several databases, they should each use
their own separate data directory.
The optional options
argument is a keywork list that specifies configuration
options. The valid options are:
auto_compact
: whether to perform auto-compaction. It defaults to false. Seeset_auto_compact/2
for the possible values
The gen_server_options
are passed to GenServer.start_link/3
.