Unreleased
Nothing yet.
1.3.3 - 2026-05-29
Calibration release for the v1.3.2 Elixir cutover.
New:
- Added a multi-audience README path map covering the operator-local Familiar, ACP editor mounting, Phoenix embeds, eval/research work, persistent characters, hosted service shapes, and multi-agent coordination. Evidence: PR #125.
- Added
docs/acp-editor.md, a worked guide for mounting the Familiar as an ACP agent in editors, including Zed configuration, standalone JSON-RPC smoke testing, diagnostics, and honest read-only scope. Evidence: PR #125. - Added
evals/familiar/v1.3.3.exs, a curated starter suite for Familiar eval work covering gate use, composition, synthesis quality, forbidden-pattern checks, and loom recall. Evidence: PR #125. - Added a real-LLM Mnesia rehydration smoke test for the production Familiar
path: summon against a workspace root, record a turn, stop the process,
summon fresh against the same root-derived Mnesia table, and assert the
entity sees prior turns through
loom.turns. Evidence: PR #124, issue #120.
Changed:
- The Familiar now defaults to the host-BEAM unrestricted evaluator for its
operator-local audience, while
sandbox: :portremains available for child-BEAM isolation. Explicitsandbox: nilwith aport_runnerstill selects the port path. Evidence: PRs #121 and #123, issue #115. - Bash medium capability text now distinguishes shell state from filesystem side effects instead of overstating persistence. Evidence: PR #123, issue #117.
- Code-medium inhabitant guidance now describes the exact top-level binding
contract for
defmodule: gate functions,loom,folded_summary, and prior-turn variables are top-level bindings that module bodies cannot see. Evidence: PR #125, issue #116. Cantrip.cast_batchguidance now says children start concurrently, bounded bymax_concurrent_children, and results are returned in request order instead of making an unconditional "parallel" claim. Evidence: PR #125, issue #118.- The Spellbook loom ritual now verifies JSONL persistence, production Familiar Mnesia rehydration, and folding as prompt projection over an append-only loom. Evidence: PRs #124 and #125, issues #119 and #120.
Verification:
- The v1.3.2 inhabitant-affordance audit spawned fix issues #115-#120; all are closed with code, docs, tests, or narrowed public contracts. The issues, PRs, and changelog now carry the durable record.
mix verify,mix docs, and PR CI passed on the final v1.3.3 batch.- Open GitHub issues after the calibration queue are only explicitly deferred future-work issues #108-#112.
1.3.2 - 2026-05-28
Package-coherence release for the Elixir cutover.
New:
- Added
docs/spellbook.md, a vocabulary guide for cantrips, identities, mediums, gates, wards, circles, looms, entities, and the Familiar. The Spellbook is linked from the README, included in ExDoc, and shipped in the Hex package. Evidence: PR #105, issue #103. - Added inhabitant-voice opening paragraphs to the documented public modules so the README, Spellbook, generated docs, and Familiar prompt describe the same runtime concepts. Evidence: PR #105, issue #102.
- Conversation mediums now expose capability text that teaches the same
medium/gate/ward grammar used by code and Familiar flows, including the
conditional
doneending. Evidence: PR #104, issue #96. - The Familiar prompt now names the BEAM/codebase environment more directly:
Code.fetch_docs/1,loom.turns, workspace boundaries, and the Cantrip bibliography are all part of the orientation. Evidence: PR #104, issue #97.
Changed:
- Removed stale migration/audit docs and dead compatibility code from the pre-cutover era. The old material remains available through git history, while the source tree now presents the Elixir package as canonical. Evidence: PR #101, issues #98 and #99.
- Split long historical Zed trace replay behind
RUN_REAL_LLM_TESTS=1 RUN_REAL_TRACE_REPLAY=1. The ordinary real-LLM release gate now covers stable live integration contracts; trace replay remains available as an explicit stress/provenance check.
Verification:
- Fresh-install dogfood from the built Hex tar succeeded outside the repo:
package contents included
.env.example,README.md, anddocs/spellbook.md;mix deps.get,mix cantrip.cast "explain what a cantrip is", andmix cantrip.familiar "summarize the loom storage modules"all ran from the extracted package using local live LLM configuration. RUN_REAL_LLM_TESTS=1over the explicit stable live/real integration suite passed: 20 tests, 0 failures, including a focused real-LLM JSONL loom rehydration smoke. The trace replay suite is no longer part of that default live gate.mix verify,mix docs, andmix hex.buildpass with the package docs and file list current.
1.3.1 - 2026-05-28
Patch release for runtime/safety findings surfaced immediately after the
1.3.0 tag.
Fixes:
- Unknown code-medium sandbox ward values now fail closed with a structured
codeerror observation instead of falling through to host-BEAM unrestricted eval. Regression coverage proves the submitted code does not execute under an unsupported sandbox value. Evidence: issue #93. - Observation arguments are now recursively redacted before they can be stored
on loom observations. Conversation tool-call args, malformed
args_raw, and port code-medium gate args are covered so secret-shaped values do not persist through observation metadata while non-secret argument shape remains useful. Evidence: issue #92.
1.3.0 - 2026-05-28
Post-v1.2 stabilization release. This drains the hardening work that landed
after 1.2.0 into a real source/package version, including the Bash sandbox
boundary change, runtime and persistence fixes, API surface cleanup, package
metadata fixes, and Familiar composition guidance.
Breaking:
- Bash-medium cantrips now require an OS sandbox and fail closed when neither
bubblewrapnorsandbox-execis available. Declared gates are projected into the shell as PATH commands and dispatch back through the parent BEAM; raw shell remains the medium, but gate authority now comes from the circle rather than ambient process access. Thedonegate is exposed ascantrip_donebecausedoneis a shell keyword. Tests may opt intomedium_opts: %{sandbox: :passthrough}; production cannot. - Bash sandbox verification now includes representative shell workloads
(
git,make,jq,/dev/nullredirects, and commonfind/sed/greppipelines). The workload suite is the support contract: when a real shell workload should be supported, add it there so adapter gaps fail in CI instead of surfacing in user sessions. Workload tests opt into%{bash_network: :on}so GitHub-hosted Linux runners can exercise bubblewrap shell behavior even when they cannot create bubblewrap's default network-deny namespace; separate tests pin the default network-deny command shape.
New:
- Familiar prompt/runtime evaluation now has a composition metric:
child_medium_usedscores whether a child turn used the expected medium. Turn metadata recordsmedium_type, JSONL rehydration preserves it, and the eval suite scores whether a Familiar child turn used the expected medium for synthesis-shaped tasks. This is rubric coverage; behavioral validation still requires real-LLM runs. Evidence: PR #90, issue #83. - Default Familiar guidance now explicitly teaches answer-shape selection: gather and compose in code, then delegate speech-shaped synthesis, explanation, review, naming, judgment, decision, or voice to a conversation child. Explicit user requests for a child, medium, or batch shape are treated as directives unless impossible. Evidence: PR #90, issue #83.
Fixes:
- Bash sandbox support now has representative shell workload coverage for
git,make,jq,/dev/null, and commonfind/sed/greppipelines, including the GitHub Actions runner network-namespace constraint. Evidence: PR #84, issue #82. - The Hex package now includes
.env.example, matching the README quick start. Package metadata tests assert READMEcpsources exist and ship in the Hex file list. Evidence: PR #88, issue #85. - The documented public API surface now matches generated docs: internal
modules are hidden,
docs/public-api.mdnames the supported surface, nested modules are checked from application metadata, and ExDoc warnings are errors. Evidence: PR #89, issue #87. - Provider and gate boundaries are typed more explicitly: LLM provider
responses flow through
%Cantrip.LLM.Response{}, gate arguments are normalized through per-gate DTOs, ACP_metaoverrides are constrained, and provider option/usage forwarding has regression coverage. Evidence: PRs #57, #66, #76, and #77. - Durable loom and JSONL behavior is stricter: append semantics align between in-memory and durable paths, JSONL writes are serialized, persisted code-state bindings are compacted, event upcasting is versioned, and truncation/medium metadata rehydrate as atom keys. Evidence: PRs #66, #70, #71, #74, and #90.
- Streaming and observability paths preserve context while staying bounded: streaming emits real text deltas, ACP trace context is propagated, intent telemetry is redacted, streaming delivery has backpressure, bridge delivery uses bounded barriers, and early stream halt shuts down runner tasks. Evidence: PRs #50, #58, and #75.
- Child composition is more disciplined: pre-built child casts compose parent wards, declaration-time child-spawn wards are enforced, and the default Familiar can read files through its normal observation gates. Evidence: PRs #72, #73, and #78.
CI / packaging:
- GitHub Actions checkout was updated for the Node 24 runner environment. Evidence: PR #81.
- The cleanup status ledger records the post-v1.2 hardening pass and the CI gates that made it durable. Evidence: PR #80.
1.2.0
Post-v1 feature completion pass. The two feature-roadmap items left after
the 1.1.0 hardening release are now shipped and closed with proof.
New:
- Added a Familiar eval harness for prompt/runtime regression work:
multi-scenario and multi-seed runs, fixture workspaces, persisted JSONL
transcripts, JSON reports, rubric criteria, optional judge scoring, and
mix cantrip.evalCI thresholds. Evidence:test/familiar_eval_test.exs,test/mix_cantrip_eval_test.exs,docs/eval-harness.md, PR #38. - Added distributed Familiar support: root and child cantrips can target
named BEAM nodes through
:node, remote casts preserve their node handle, remote child observations are grafted into the parent loom, andCantrip.Clusterprovides Mnesia extra-node/table-copy helpers for replicated loom storage. Evidence:test/distributed_cantrip_test.exs,test/cluster_test.exs,docs/distributed-familiar.md, PR #39.
Fixes before tag:
- Remote distributed calls now use bounded
:rpc.call/5timeouts instead of the distributed Erlang default of:infinity; unknown string node names fail closed instead of silently falling back to local execution. Cantrip.Cluster.connect_mnesia/2now preserves Mnesia schema timeout details so operators can see which table failed to synchronize.
1.1.0
Post-v1 hardening and cleanup pass. All cleanup issues from the v1 backlog are closed with proof, including issues filed during the cleanup pass (#32, #34, #35, #36, #37). See the cleanup-status tracker for the full ledger.
Behavior change worth flagging for downstream callers:
compile_and_loadnow requires an explicitallow_compile_modulesallowlist; previously an empty allowlist was permissive. Deprecatedallow_compile_namespaceswards fail loudly instead of being silently ignored.Elixir.Cantrip.*module names are rejected from hot-load allowlists (except the explicitElixir.Cantrip.Hot.*namespace).
Fixes:
EntityServerno longer runs entity episodes inside the GenServer mailbox. Episodes execute in a supervised per-entity runner task and reply viaGenServer.reply/2. Concurrentsend/2while an episode is running returns busy immediately. Code-medium port ownership survives across persistent sends. Crash-restore preserves stream context.- Malformed JSON in provider tool-call arguments now produces a structured
is_error: trueobservation rather than silently substitutingargs: %{}and proceeding to (potentially) the wrong gate execution. Decode failure carriesargs_raw+args_decode_errorfrom adapter through the executor. - Mnesia
ensure_schema/0now propagates non-already_existserrors as root-causeinit/1failures; previously the catch-all:okclause hid filesystem and permission errors. - Unknown medium types now fail validation with an explicit error and a
list of valid options rather than silently normalizing to
:conversation. - All
String.to_atom/1paths from external strings are now bounded: parent-context normalization uses a bounded allowlist; code-medium gate bindings useString.to_existing_atom/1; loom JSONL restoration uses existing atoms; Familiar table/node atoms use SHA-256 fingerprints. - All three filesystem gates (
read_file,list_dir,search) now route through shared path validation consistently: missing root fails closed, path traversal fails closed. - Code-medium bare gate-call rewriting now parses with
Code.string_to_quoted/1and rewrites local gate-call AST nodes rather than doing text-level rewrites. Strings, remote calls, already-dotted calls, and definition heads are no longer subject to surprising rewrites. - Safe boundary formatting wraps provider errors, JSONL persistence fallbacks, port code-medium error surfaces, gate observations, ACP wire stringification, and CLI output. Credential-shaped substrings are redacted before crossing entity, disk, or protocol boundaries.
req_llm1.12 preserves multiple system messages through both Anthropic and Gemini encoders; previously the v1.9 path could drop secondary system messages.- Familiar workspace cookie now fails loudly on invalid existing cookies rather than silently regenerating; existing distributed connections are no longer at risk of being broken on a malformed-cookie restart.
- The live real-LLM echo/done integration prompt now gives a stricter
two-step tool contract and descriptions so current Anthropic models
terminate with
doneinstead of looping onecho.
New:
- Added a first-class
mixgate for Familiars attached to Elixir workspaces. It runs allowlisted Mix tasks under the configured root with argv as data, bounded output, timeout handling, and structured observations. The Familiar default allowscompileandformat;testis opt-in withrun_tests: trueor an explicitallow_mix_tasksoverride. Cantrip.Familiar.new/1documented Dune-variant divergence indocs/port-isolated-runtime.md.sandbox: :duneis now explicitly a smaller-surface in-process variant of the code medium with different bindings — entity prompts need to match the variant in use.test/readme_examples_test.exspins the README/public-api quickstart shapes; future drift between documented examples and the runtime constructor signature fails CI.docs/observability.mdis the canonical telemetry event registry (subscription patterns, alert recommendations, trace correlation model); implementation of the 9-item event checklist tracked on #11.docs/cleanup-status.mdis the living tracker for the cleanup pass.
1.0.0
The first stable release. The Elixir implementation is the canonical package surface; the runtime is documented and live-verified across the Anthropic model tier (haiku, sonnet, opus).
Bug fixes surfaced during pre-tag live verification against real
Anthropic. All four shipped past mix verify green; all four needed
live driving to surface. Adds a v1 audit document and a live-integration
test module.
- Fixed: streaming responses dropped every tool call. The adapter consumed
the chunk stream via
tokens/1+Enum.reducefor the realtime text delta, then calledtool_calls/1on the now-depleted stream and got nothing. Switched toReqLLM.StreamResponse.process_stream/2, the documented public API for streaming tool-using agents. - Fixed: persistent entities (
Cantrip.summon+Cantrip.send) lost every assistant turn across sends. The terminating branch of entity turn execution never folded the final assistant message intostate.messages. The next send appended a user message to a history that still ended at the prior user message; the model saw a stack of users with no record of its own answers and anchored on the first prompt. - Fixed: folding only preserved one leading
:systemmessage even though initial message construction can emit two (identity + capability text). On fold, the capability text dropped into the foldable body — over long sessions the entity would silently lose its medium physics instructions. - Upgraded
req_llmfrom~> 1.9to~> 1.12. v1.12'sagentjido/req_llm@9d790fdremoves the offendinginterspersebetween Anthropic system content blocks. With the upstream encoder fixed, the local workaround introduced in c994878 was deleted. - Added
test/live_anthropic_test.exscovering code-medium sync, code-medium streaming, and conversation-medium tool-calling. Gated onRUN_REAL_LLM_TESTS=1via existingCantrip.Test.RealLLMEnv. - Added
docs/v1-audit.mdrecording verified paths, uncertain paths, and bugs found and fixed during the pre-tag audit.
1.0.0-rc.1
- Made the Elixir implementation the only canonical package surface.
- Removed the old spec/conformance scaffold and replaced unique coverage with native ExUnit tests.
- Removed the compiled examples module and example Mix task; the notebook and tests are the teaching surface.
- Removed hand-written OpenAI-compatible, Anthropic, and Gemini adapters.
Provider configuration now routes through ReqLLM via
Cantrip.LLM.from_env/1. - Removed DETS and Auto loom storage. Supported storage is memory, JSONL, and Mnesia.
- Removed
call_entityandcall_entity_batchgates. Composition now usesCantrip.new/1,Cantrip.cast/3, andCantrip.cast_batch/2. - Removed the bare
readgate. Useread_file, which validates paths against the configured root. - Reduced Mix task surface to
mix cantrip.castandmix cantrip.familiar. - Made Familiar ACP the default ACP runtime.
- Made Familiar hot-loading opt-in with
evolve: true. - Replaced process/cutover docs with package docs: README, CONTRIBUTING, DEPLOYMENT, architecture, signer-key runbook, and changelog.
- Added public API and v1 migration guides to the packaged ExDoc extras.
- Added the safe port code medium.
sandbox: :portevaluates LLM-written Elixir through Dune in a child BEAM process while gates, child cantrip API calls, stdio, loom grafting, telemetry, provider access, and hot-load policy stay in the parent. - Added
port_runnerfor launching that child through a deployment-provided OS/container sandbox. - Made the Familiar default to the safe port code medium. Raw child-BEAM
evaluation remains available as
sandbox: :port_unrestricted; the old host-BEAM evaluator remains available assandbox: :unrestrictedfor trusted local development. - Added
docs/port-isolated-runtime.mdto document the implemented isolation boundary and remaining deployment responsibilities.