cubdb v0.4.0 CubDB View Source

CubDB is an embedded key-value database written in the Elixir language. It runs locally, and is backed by a single file.

Both keys and values can be any arbitrary Elixir (or Erlang) term.

The most relevant features offered by CubDB are:

  • Simple get/3, put/3, and delete/2 operations

  • Arbitrary selection of entries and transformation of the result with select/3

  • Atomic multiple updates with get_and_update_multi/4

  • Concurrent read operations, that do not block nor are blocked by writes

The CubDB database file uses an immutable data structure that guaratees robustness to data corruption, as entries are never changed in-place. It also makes read operations consistent, even while write operations are being performed concurrently, as ranges of entries are selected on immutable snapshots.

Usage

Start CubDB by specifying a directory for its database file (if not existing, it will be created):

{:ok, db} = CubDB.start_link("my/data/directory")

The get/2, put/3, and delete/2 functions work as you probably expect:

CubDB.put(db, :foo, "some value")
#=> :ok

CubDB.get(db, :foo)
#=> "some value"

CubDB.delete(db, :foo)
#=> :ok

CubDB.get(db, :foo)
#=> nil

Range of keys are retrieved using select/3:

for {key, value} <- [a: 1, b: 2, c: 3, d: 4, e: 5, f: 6, g: 7, h: 8] do
  CubDB.put(db, key, value)
end

CubDB.select(db, min_key: :b, max_key: :e)
#=> {:ok, [b: 2, c: 3, d: 4, e: 5]}

But select/3 can do much more than that. It can apply a pipeline of operations (map, filter, take, drop and more) to the selected entries, it can select the entries in normal or reverse order, and it can reduce the result using an arbitrary function:

# Take the sum of the last 3 even values:
CubDB.select(db,
  reverse: true,
  pipe: [
    map: fn {_key, value} ->
      value
    end,
    filter: fn value ->
      is_integer(value) && Integer.is_even(value)
    end,
    take: 3
  ],
  reduce: fn n, sum -> sum + n end
)
#=> {:ok, 18}

As CubDB uses an immutable data structure, write operations cause the data file to grow. Occasionally, it is adviseable to run a compaction to optimize the file size and reclaim disk space. Compaction can be started manually by calling compact/1, and runs in the background, without blocking other operations:

CubDB.compact(db)
#=> :ok

Alternatively, automatic compaction can be enabled, either passing an option to start_link/3, or by calling set_auto_compact/2.

Link to this section Summary

Functions

Returns a specification to start this module under a supervisor.

Runs a database compaction.

Deletes the entry associated to key from the database.

Returns the dirt factor.

Fetches the value for the given key in the database, or return :error if key is not present.

Gets the value associated to key from the database.

Gets the value corresponding to key and updates it, in one atomic transaction.

Gets and updates or deletes multiple entries in an atomic transaction.

Returns whether an entry with the given key exists in the database.

Writes an entry in the database, associating key to value.

Selects a range of entries from the database, and optionally performs a pipeline of operations on them.

Set whether to perform automatic compaction, and how.

Returns the number of entries present in the database.

Starts the CubDB database process linked to the current process.

Updates the entry corresponding to key using the given function.

Link to this section Types

Link to this section Functions

Returns a specification to start this module under a supervisor.

The default options listed in Supervisor are used.

Link to this function

compact(db) View Source
compact(GenServer.server()) :: :ok | {:error, binary()}

Runs a database compaction.

As write operations are performed on a database, its file grows. Occasionally, a compaction operation can be run to shrink the file to its optimal size. Compaction runs in the background and does not block operations.

Only one compaction operation can run at any time, therefore if this function is called when a compaction is already running, it returns {:error, :pending_compaction}.

When compacting, CubDB will create a new data file, and eventually switch to it and remove the old one as the compaction succeeds. For this reason, during a compaction, there should be enough disk space for a second copy of the database file.

Link to this function

delete(db, key) View Source
delete(GenServer.server(), key()) :: :ok

Deletes the entry associated to key from the database.

If key was not present in the database, nothing is done.

Returns the dirt factor.

The dirt factor is a number, ranging from 0 to 1, giving an indication about the amount of overhead storage (or "dirt") that can be cleaned up with a compaction operation. A value of 0 means that there is no overhead, so a compaction would have no benefit. The closer to 1 the dirt factor is, the more can be cleaned up in a compaction operation.

Link to this function

fetch(db, key) View Source
fetch(GenServer.server(), key()) :: {:ok, value()} | :error

Fetches the value for the given key in the database, or return :error if key is not present.

If the database contains an entry with the given key and value value, it returns {:ok, value}. If key is not found, it returns :error.

Link to this function

get(db, key, default \\ nil) View Source
get(GenServer.server(), key(), value()) :: value()

Gets the value associated to key from the database.

If no value is associated with key, default is returned (which is nil, unless specified otherwise).

Link to this function

get_and_update(db, key, fun) View Source
get_and_update(
  GenServer.server(),
  key(),
  (value() -> {any(), value()} | :pop)
) :: {:ok, any()}

Gets the value corresponding to key and updates it, in one atomic transaction.

fun is called with the current value associated to key (or nil if not present), and must return a two element tuple: the result value to be returned, and the new value to be associated to key. fun mayalso return :pop, in which case the current value is deleted and returned.

The return value is {:ok, result}, or {:error, reason} in case an error occurs.

Link to this function

get_and_update_multi(db, keys_to_get, fun, timeout \\ 5000) View Source
get_and_update_multi(
  GenServer.server(),
  [key()],
  (%{optional(key()) => value()} ->
     {any(), %{optional(key()) => value()} | nil, [key()] | nil}),
  timeout()
) :: {:ok, any()} | {:error, any()}

Gets and updates or deletes multiple entries in an atomic transaction.

Gets all values associated with keys in keys_to_get, and passes them as a map of %{key => value} entries to fun. If a key is not found, it won't be added to the map passed to fun. Updates the database and returns a result according to the return value of fun. Returns {:ok, return_value} in case of success, {:error, reason} otherwise.

The function fun should return a tuple of three elements: {return_value, entries_to_put, keys_to_delete}, where return_value is an arbitrary value to be returned, entries_to_put is a map of %{key => value} entries to be written to the database, and keys_to_delete is a list of keys to be deleted.

The optional timeout argument specifies a timeout in milliseconds, which is 5000 (5 seconds) by default.

The read and write operations are executed as an atomic transaction, so they will either all succeed, or all fail. Note that get_and_update_multi/4 blocks other write operations until it completes.

Example

Assuming a database of names as keys, and integer monetary balances as values, and we want to transfer 10 units from "Anna" to "Joy", returning their updated balance:

{:ok, {anna, joy}} = CubDB.get_and_update_multi(db, ["Anna", "Joy"], fn entries ->
  anna = Map.get(entries, "Anna", 0)
  joy = Map.get(entries, "Joy", 0)

  if anna < 10, do: raise(RuntimeError, message: "Anna's balance is too low")

  anna = anna - 10
  joy = joy + 10

  {{anna, joy}, %{"Anna" => anna, "Joy" => joy}, []}
end)

Or, if we want to transfer all of the balance from "Anna" to "Joy", deleting "Anna"'s entry, and returning "Joy"'s resulting balance:

{:ok, joy} = CubDB.get_and_update_multi(db, ["Anna", "Joy"], fn entries ->
  anna = Map.get(entries, "Anna", 0)
  joy = Map.get(entries, "Joy", 0)

  joy = joy + anna

  {joy, %{"Joy" => joy}, ["Anna"]}
end)

Returns whether an entry with the given key exists in the database.

Link to this function

put(db, key, value) View Source
put(GenServer.server(), key(), value()) :: :ok

Writes an entry in the database, associating key to value.

If key was already present, it is overwritten.

Link to this function

select(db, options \\ [], timeout \\ 5000) View Source
select(GenServer.server(), Keyword.t(), timeout()) ::
  {:ok, any()} | {:error, Exception.t()}

Selects a range of entries from the database, and optionally performs a pipeline of operations on them.

It returns {:ok, result} if successful, or {:error, exception} if an exception is raised.

Options

The min_key and max_key specify the range of entries that are selected. By default, the range is inclusive, so all entries that have a key greater or equal than min_key and less or equal then max_key are selected:

# Select all entries where `"a" <= key <= "d"`
CubDB.select(db, min_key: "b", max_key: "d")

The range boundaries can be excluded by setting min_key or max_key to {key, :excluded}:

# Select all entries where `"a" <= key < "d"`
CubDB.select(db, min_key: "b", max_key: {"d", :excluded})

Any of :min_key and :max_key can be omitted or set to nil, to leave the range open-ended.

# Select entries where `key <= "a"
CubDB.select(db, max_key: "a")

# Or, equivalently:
CubDB.select(db, min_key: nil, max_key: "a")

In case the key boundary is the literal value nil, the longer form must be used:

# Select entries where `nil <= key <= "a"`
CubDB.select(db, min_key: {nil, :included}, max_key: "a")

The reverse option, when set to true, causes the entries to be selected and traversed in reverse order.

The pipe option specifies an optional list of operations performed sequentially on the selected entries. The given order of operations is respected. The available operations, specified as tuples, are:

  • {:filter, fun} filters entries for which fun returns a truthy value

  • {:map, fun} maps each entry to the value returned by the function fun

  • {:take, n} takes the first n entries

  • {:drop, n} skips the first n entries

  • {:take_while, fun} takes entries while fun returns a truthy value

  • {:drop_while, fun} skips entries while fun returns a truthy value

Note that, when selecting a key range, specifying min_key and/or max_key is more performant than using {:filter, fun} or {:take_while | :drop_while, fun}, because min_key and max_key avoid loading unnecessary entries from disk entirely.

The reduce option specifies how the selected entries are aggregated. If reduce is omitted, the entries are returned as a list. If reduce is a function, it is used to reduce the collection of entries. If reduce is a tuple, the first element is the starting value of the reduction, and the second is the reducing function.

Examples

To select all entries with keys between :a and :c as a list of {key, value} entries we can do:

{:ok, entries} = CubDB.select(db, min_key: :a, max_key: :c)

If we want to get all entries with keys between :a and :c, with :c exluded, we can do:

{:ok, entries} = CubDB.select(db, min_key: :a, max_key: {:c, :excluded})

To select the last 3 entries, we can do:

{:ok, entries} = CubDB.select(db, reverse: true, pipe: [take: 3])

If we want to obtain the sum of the first 10 positive numeric values associated to keys from :a to :f, we can do:

{:ok, sum} = CubDB.select(db,
  min_key: :a,
  max_key: :f,
  pipe: [
    map: fn {_key, value} -> value end, # map values
    filter: fn n -> is_number(n) and n > 0 end # only positive numbers
    take: 10, # take only the first 10 entries in the range
  ],
  reduce: fn n, sum -> sum + n end # reduce to the sum of selected values
)
Link to this function

set_auto_compact(db, setting) View Source
set_auto_compact(
  GenServer.server(),
  boolean() | {integer(), integer() | float()}
) :: :ok | {:error, binary()}

Set whether to perform automatic compaction, and how.

If set to false, no automatic compaction is performed. If set to true, auto-compaction is performed, following a write operation, if at least 100 write operations occurred since the last compaction, and the dirt factor is at least 0.2. These values can be customized by setting the auto_compact option to {min_writes, min_dirt_factor}.

It returns :ok, or {:error, reason} if setting is invalid.

Compaction is done in the background and does not block other operations, but can create disk contention, so it should not be performed too often. When writing a lot into the database, such as when importing data from an external source, it is adviseable to turn off auto compaction, and manually run compaction at the end of the import.

Returns the number of entries present in the database.

Link to this function

start(data_dir, options \\ [], gen_server_options \\ []) View Source

Starts the CubDB database without a link.

See start_link/2 for more informations.

Link to this function

start_link(data_dir, options \\ [], gen_server_options \\ []) View Source

Starts the CubDB database process linked to the current process.

The data_dir argument is the directory path where the database files will be stored. If it does not exist, it will be created. Only one CubDB instance can run per directory, so if you run several databases, they should each use their own separate data directory.

The optional options argument is a keywork list that specifies configuration options. The valid options are:

  • auto_compact: whether to perform auto-compaction. It defaults to false. See set_auto_compact/2 for the possible values

The gen_server_options are passed to GenServer.start_link/3.

Link to this function

update(db, key, initial, fun) View Source
update(GenServer.server(), key(), value(), (value() -> value())) :: :ok

Updates the entry corresponding to key using the given function.

If key is present in the database, fun is invoked with the corresponding value, and the result is set as the new value of key. If key is not found, initial is inserted as the value of key.

The return value is :ok, or {:error, reason} in case an error occurs.