Architecture - terminusdb_ex

Copy Markdown View Source

This document records the review of TerminusDB, the existing client ecosystem, and the architecture decisions for terminusdb_ex, the Elixir client. It is the canonical reference for contributors. Detailed, single-concern records live in docs/adr/.


1. Review summary

1.1 TerminusDB

TerminusDB is an open-source document graph database with built-in version control. It stores JSON documents as a graph of RDF triples, tracks every change as an immutable commit, and supports git-for-data workflows: branches, commits, diffs, merges, push, pull, clone, fetch, squash, and reset (time-travel).

Key concepts:

ConceptDescription
DocumentA JSON object conforming to a schema class, stored as linked triples.
SchemaA graph of Class documents with typed properties; optional per database.
GraphTwo named graphs per branch: instance (data) and schema (types).
Branch / Repo / Reforganization/database/repo/branch/commit_ref resource addressing.
CommitImmutable snapshot; chain of commits gives full history.
WOQLWeb Object Query Language - a datalog query language serialized as JSON-LD.
GraphQLAuto-generated GraphQL endpoint over the schema.
JSON-LDThe wire format for documents and WOQL queries.

REST API surface (OpenAPI v10.0.x)

Base URL: http://<host>:6363/api/. Auth: HTTP Basic (admin:root by default).

EndpointMethodPurpose
/GETList databases for the authenticated user
/infoGETServer version / capabilities
/okGETLiveness check
/db/GETList all databases (branches, verbose)
/db/{org}/{db}GETDatabase details
/db/{org}/{db}HEADCheck a database exists
/db/{org}/{db}POSTCreate a database (body: label, comment, public, schema)
/db/{org}/{db}PUTUpdate database metadata
/db/{org}/{db}DELETEDelete a database (force)
/document/{path}GETGet documents (graph_type, id, type, skip, count, as_list, unfold, minimized, compress_ids)
/document/{path}POSTInsert documents (author, message, graph_type, full_replace, raw_json)
/document/{path}PUTReplace documents (create, raw_json)
/document/{path}DELETEDelete documents (id, nuke)
/schemaGETClass frame (compress_ids, expand_abstract)
/woql, /woql/{path}POSTExecute a WOQL query (body: query, commit_info, all_witnesses)
/branch/{path}POST/DELETECreate / delete a branch
/squash/{path}GETSquash commits
/reset/{path}POSTReset branch HEAD to a commit
/optimize/{path}POSTOptimize a resource
/prefixes/{path}GETFetch graph prefixes
/clone/{org}/{db}POSTClone a remote database
/fetch/{path}POSTFetch from a remote
/push/{path}POSTPush to a remote
/pull/{path}POSTPull from a remote
/diffPOSTDiff two documents (before, after, keep)
/patchPOSTApply a patch (before, patch)

Error model: failures return HTTP 4xx/5xx with a JSON body of the shape {"@type": "api:*ErrorResponse", "api:error": {...}, "api:message": "...", "api:status": "api:failure"}.

WOQL

WOQL is a composable, declarative query language backed by a datalog engine. Queries are built as an AST and serialized to JSON-LD. Variables use the v:Name convention and unify across the query (shared variables create implicit joins). The language supports functional style (and(triple(a,b,c), triple(d,e,f))) and fluent style; functional is recommended. Because WOQL is itself datalog, it is the natural compilation target for an ExDatalog integration (see ADR-0004).

1.2 Python client (terminusdb)

The official Python client is the reference for API ergonomics.

Strengths:

  • Single Client(server_url) entry point with connect(...) that establishes credentials and the current team/db/branch/repo/ref context.
  • Document API is high-level and Pythonic: insert_document, get_document, query_document, replace_document, update_document, delete_document.
  • WOQLQuery builder object compiles to JSON-LD.
  • Token + JWT + basic-auth support.

Weaknesses / opportunities for Elixir:

  • The client is mutable and stateful: connection context is held on the instance and mutated by setters. This maps poorly to Elixir and to concurrency. An Elixir client should treat context as immutable data carried in a struct, with explicit scope overrides per call.
  • Error handling is exception-based with a large APIError hierarchy; Elixir can do better with a single typed TerminusDB.Error struct and {:ok, _} | {:error, _} tuples.
  • No streaming of large result sets; the document GET returns concatenated JSON. Elixir can stream via Req's into: option and Stream/Enumerable.
  • No telemetry. Elixir can emit :telemetry events uniformly.
  • No schema-to-struct mapping. Elixir can leverage Ecto.Schema for this (ADR-0003).

1.3 Elixir ecosystem review

LibraryRoleDecision
ReqHigh-level HTTP client on Finch; built-in JSON, params, auth, retry, streaming, fake adapter for testsSelected HTTP client (ADR-0001)
JasonJSON codec (Req default decoder)Selected
NimbleOptionsLightweight schema validation for config/optionsSelected for Config + API options
TelemetryStandard instrumentationSelected (ADR-0005)
Tesla / FinchAlternatives to ReqReq preferred: batteries-included, testable, streaming
EctoSchema/changeset for the TerminusDB.Schema macro (ADR-0003)Optional dep, not a full adapter in v0.1
ExplorerDataFrame interopFuture work, not in v0.1
StreamDataProperty-based testingSelected dev dep (ADR-0006)

2. Architecture options

Option A - Pure HTTP client

A thin, faithful wrapper over the REST API. Lowest complexity, fastest to ship, but leaves all schema/struct ergonomics to the user.

Option B - Client + Ecto integration

A adds use TerminusDB.Schema (built on Ecto.Schema + Ecto.Changeset) so users model documents as Elixir structs and generate TerminusDB schema definitions. Major ergonomics win; Ecto is an optional dependency.

Option C - Client + Ecto + ExDatalog

B adds a Datalog DSL that compiles rules to WOQL JSON-LD and can load query results back into an in-process Datalog engine. Highest value for knowledge-graph and reasoning workloads.

Option D - Client + local graph engine

C adds a local in-process graph store for offline/cached querying. Largest scope; risks reimplementing the database. Not justified for a client library - TerminusDB itself is the graph engine.

Decision

Adopt Option C as the target architecture, delivered incrementally.

  • v0.1 (this milestone): Option A core - Client, Config, Error, Database, telemetry, streaming, and the WOQL/raw-query execution primitives. This is the verifiable foundation everything else builds on.
  • v0.2: Document + Schema + Branch + Commit + Diff + Merge APIs and the WOQL DSL.
  • v0.3: Ecto integration (TerminusDB.Schema) - Option B.
  • v0.4: ExDatalog integration - Option C.
  • Option D is rejected for v0.x; a local engine is out of scope for a client.

This sequencing gives a usable, tested client immediately and de-risks the harder integrations by building them on a solid HTTP core.


3. High-level design

TerminusDB
 Application           OTP supervision tree
 Config                immutable connection/context (NimbleOptions-validated)
 Client                Req-based HTTP wrapper; the only module that touches the wire
 Error                 typed error struct + exception
 Database              database management API
 Document              document CRUD + query            (v0.2)
 Schema                schema frame API + Ecto macro    (v0.2/v0.3)
 Branch                branch API                       (v0.2)
 Commit                history / log                    (v0.2)
 Diff                  diff + patch                     (v0.2)
 Merge                 push / pull / rebase             (v0.2)
 WOQL                  functional DSL  JSON-LD         (v0.2)
 GraphQL               GraphQL execution                (v0.2)
 Datalog               ExDatalog integration            (v0.4)
 Telemetry             event definitions + helpers
 Streaming             document stream helpers

3.1 Principles

  1. Immutable context. A TerminusDB.Config struct holds endpoint, auth, organization, database, branch, repo, ref. Every API call takes a config and returns derived configs (TerminusDB.Config.with_database/2, TerminusDB.Config.with_branch/2) rather than mutating. This is concurrent-safe and matches Elixir idioms - and corrects the Python client's mutable-state design.
  2. One wire module. TerminusDB.Client is the only module that issues HTTP requests. All API modules (Database, Document, …) compose a request and hand it to Client.request/2. This centralizes auth, headers, JSON, telemetry, retry, and errors.
  3. Typed errors, tuple results. Public functions return {:ok, result} or {:error, %TerminusDB.Error{}}. A companion !/1 variant raises TerminusDB.Error.
  4. Telemetry everywhere. Every public operation emits [:terminusdb, <area>, :start] and [:stop] events with measurements and metadata (ADR-0005).
  5. Streaming first where it matters. Document listing and query results offer Stream/Enumerable variants backed by Req's into: option (ADR-0007).
  6. Minimal dependencies. Only req, jason, nimble_options, telemetry for v0.1. Ecto becomes an optional dependency only when TerminusDB.Schema lands.

3.2 Request flow

API module (e.g. Database.create/3)
   builds path + body + query params
   calls Client.request(config, method, path, opts)
        Telemetry.start
        Req.request!(base_url, auth, json, params, ...)
        on 2xx  decode body  Telemetry.stop  {:ok, body}
        on 4xx/5xx  build TerminusDB.Error  Telemetry.stop(exception:)  {:error, error}

3.3 Resource addressing

TerminusDB addresses resources as organization/database/repo/branch/ref. The config struct carries these; path builders in TerminusDB.Client.Path assemble the correct URL segment for each endpoint (e.g. /db/:org/:db, /document/:org/:db).