All notable changes to this project will be documented in this file.
The format follows Keep a Changelog and ExAthena adheres to Semantic Versioning.
v0.4.8 — Canonical %ReqLLM.ToolCall{} on assistant replay (llama.cpp HTTP 500 fix)
Fixed
ExAthena.Providers.ReqLLM.format_tool_calls/1now constructs%ReqLLM.ToolCall{}viaReqLLM.ToolCall.new/3instead of a flat%{id, name, arguments}map. The canonical struct nests the function payload undertype: "function"/function: %{name, arguments}, which is what llama.cpp's strict parser requires on assistant tool_call replay (iteration 2+ of a tool-calling loop). Without this, llama.cpp returned HTTP 500Failed to parse messages: Missing tool call typeon every multi-turn tool run; OpenAI/Anthropic/Ollama silently tolerated the looser shape and so the bug only surfaced against llama-server.- Arguments are JSON-encoded once at the provider boundary:
nil → "{}", pre-encoded binaries pass through, maps go throughJason.encode!/1. Internal ExAthena code continues to work with Elixir maps; the canonical wire shape is materialised only at the edge.
Why
req_llm 1.10/1.11's %ReqLLM.ToolCall{} mirrors the OpenAI Chat
Completions wire format and is what every supported backend expects.
Aligning to it fixes llama.cpp today and pre-empts breakage when
req_llm tightens validation in future releases. See ADR-0004.
v0.4.7 — :llamacpp placeholder api_key (parity with :ollama)
Fixed
ExAthena.Providers.ReqLLM.build_opts/2now substitutes a placeholderapi_key: "llamacpp"whenprovider: :llamacppis selected and no:api_keyis supplied. Mirrors the:ollamapath added in v0.4.1. Without this,req_llm1.10/1.11's openai adapter rejected the request withInvalid parameter: :api_key option, ...before any HTTP call left the BEAM — manifesting downstream asExAthena.Error{kind: :server_error, message: "{:http_streaming_failed, {:provider_build_failed, ... 'Failed to build streaming request: ...'}}"}, which retry/recovery layers then classified as:api_errorand hammered every 3 minutes until the recovery budget exhausted.ExAthena.Config.@local_openai_compatible_backendsnow includesllamacpp: :llamacppso the backend marker is threaded throughConfig.pop_provider!/1forprovider: :llamacppas it already was forprovider: :ollama.
Why
Local llama-server (the :llamacpp provider) accepts unauthenticated
HTTP just like Ollama; the OpenAI-compatible adapter in req_llm
nevertheless requires some non-nil :api_key. Both bugs (no backend
marker + no placeholder) had to land together for :llamacpp to work
end-to-end against an unauthenticated local server. Verified against
udin_code's failing planning run on
http://localhost:8080 with model auto-detected from /v1/models.
v0.4.6 — Weak-model reliability: raw-JSON tool calls, compact schemas, strict structured output
Added — ExAthena.ToolCalls.RawJson (ADR 0001)
- Third tool-call extraction tier behind
NativeandTextTagged. Recognises bare and```json-fenced JSON objects with aname/argumentsshape using a balanced-brace scanner that tracks in-string state and backslash escapes. Wired intoToolCalls.extract/2as the final fallback so weak open-weight models (e.g.qwen2.5-coder:14bon Ollama) that emit tool calls as bare JSON in assistant prose are no longer silently dropped. - A cheap pre-check requires both
"name"and"arguments"substrings before the scanner runs, so non-tool prose is rejected in O(n) without engaging the brace walker.
Added — per-request capability override (ADR 0001)
ExAthena.Loop.run/2now accepts a:capabilitiesopt that is merged on top ofprovider_mod.capabilities()for the duration of the run. Lets callers flipnative_tool_calls: falsefor a single request to forceModes.ReActinto fence-augmented prompting, with no provider-module fork required. The merge is shallow and forward-compatible: future capability flags slot in the same way.
Added — compact tool-schema mode (ADR 0002)
ExAthena.ToolCalls.augment_system_prompt/3(was/2) takes a newcompact: trueopt that emits one type-signature line per tool instead of dumping the full JSON schema, e.g.- read_file(path: string, content?: string) — read a file. Required vs optional marked with?; types compacted tostring|number|integer|boolean|array|object|null; scalar arrays rendered asT[]; nested object structure collapsed past depth 2; description truncated to first sentence and capped near 80 chars.- Default behaviour is byte-identical —
compact: trueis opt-in and the existing 2-arity call sites still work unchanged. - New static capability
compact_tool_schemas: trueonExAthena.Providers.ReqLLM.capabilities/0so downstream callers can gate per-model heuristics on it (e.g. AthenaRunner deciding to flip oncompact: trueonly for known-weak Ollama model ids).
Why (ADRs 0001 + 0002)
Weak Ollama models such as qwen2.5-coder:14b show measurable quality
degradation past ~4 KB of system prompt and don't reliably honour the
~~~tool_call fence when given a 5–10 KB schema dump from
augment_system_prompt/2. Together the two changes mean: (a) tool
calls those models do emit (often as bare JSON) are now caught by
RawJson, and (b) the prompt budget those models see drops by ~80%
on a 10-tool fixture, raising fence compliance. Strong hosted models
(Claude, GPT-4) are unaffected — they still receive tools structurally
via native tool_calls. Unblocks udin-io/udin_code#1268.
Added — strict structured-output pass-through (ADR 0003)
ExAthena.Providers.ReqLLM.build_opts/2now forwardsresponse_formatto req_llm, preferringopts[:response_format]overrequest.response_format. Accepts:json,"json", and JSON-schema maps verbatim — req_llm normalises before hitting the Ollama / OpenAI / etc. backend.- New
:structured_outputcapability key onExAthena.Capabilitiestypespec;Providers.ReqLLM.capabilities/0declaresstructured_output: true. - New module
ExAthena.StructuredOutputwithrequest/3. Strict one-shot variant: requires the resolved provider (after merging the per-request:capabilitiesoverride from ADR 0001) to declare:structured_output. Returns{:ok, decoded_map}on success,{:error, :no_structured_output}when the capability is absent,{:error, :invalid_json}on decode failure, and passes provider errors through. - The existing
ExAthena.Structured.extract/2retry/repair flavour is unchanged. The two helpers coexist by design —Structured.extract/2for best-effort retry-with-validation,StructuredOutput.request/3for the strict provider-enforced contract.
Why (ADR 0003)
Downstream udin_code flows that need schema-conforming JSON (scope
decisions, ExitPlanMode-style phase-end signals) currently rely on
brittle text extraction because weaker open-weight models don't emit
structured tool_calls. Now those flows can request provider-enforced
JSON when the resolved provider supports it, and get a loud
:no_structured_output error otherwise — never a silent text-parse
fallback.
v0.4.5 — Stream heartbeat + time-to-first-token (TTFT)
Added
ExAthena.Providers.ReqLLM.consume_stream/3now emits a heartbeat log every 10 s while no chunk has arrived yet, and a one-time TTFT line on the first content/tool_call chunk:[ExAthena.ReqLLM] ⋯ waiting on stream (Ns elapsed)(info, repeats every 10 s during the prompt-processing phase, stops automatically once chunks start flowing).[ExAthena.ReqLLM] ←first_chunk after Yms (TTFT)(info, fires exactly once per stream).
Why
Local Ollama / llama.cpp on a 14 B+ model regularly spends 30–120 s
processing the prompt before emitting the first chunk. Until 0.4.5
nothing surfaced on the wire during that wait — callers had no way
to distinguish "slow but alive" from "stalled and silent". The
heartbeat closes that gap with a single info-level line every 10 s,
and the TTFT log captures the latency once tokens start flowing so
operators can compare cold vs. warm runs. The heartbeat process
exits as soon as the reduce returns (success or crash) via
try/after, and self-checks Process.alive?(parent) before
emitting in case the calling process was killed mid-stream.
v0.4.4 — Adapter-boundary message logging (Claude Code-style)
Added
ExAthena.Providers.ReqLLMnow emits structured Logger messages at the adapter boundary, mirroring the Claude Code SDK's[SDK]log style for parity with consumers that already grep for that prefix:[ExAthena.ReqLLM] →query|→stream model=… msgs=N tools=K base_url=… backend=…(info level, one line per request).[ExAthena.ReqLLM] →query system_prompt=…\n msg[i] role: <preview>(debug level, full message body previews with whitespace collapsed and capped at 200B/message).[ExAthena.ReqLLM] ←text_delta NB: <preview>per streamed chunk (debug level).[ExAthena.ReqLLM] ←tool_call name=… args=…per tool call (debug level).[ExAthena.ReqLLM] ←meta finish_reason=…and←usage …(debug level).[ExAthena.ReqLLM] ←done finish_reason=… text_chars=N tool_calls=K usage=…on completion (info level).[ExAthena.ReqLLM] ←error …when req_llm returns an error (warning level).
- All debug-level lines wrap their message construction in
Logger.debug(fn -> … end)so log assembly is skipped at higher log levels — zero overhead in production.
Why
Until 0.4.4 the adapter forwarded request/response data to req_llm
without leaving any breadcrumb in the host application's log. When a
streaming run stalled or the LLM returned unexpected content, callers
had no way to see what was sent or what came back without attaching a
telemetry handler. The new lines give the same kind of visibility
ClaudeCode's SDK provides, so debugging Ollama/OpenAI/llama.cpp flows
is a tail -f phoenix_output.log | grep '\[ExAthena.ReqLLM\]' away.
v0.4.3 — Convert tools to %ReqLLM.Tool{} structs at the adapter boundary
Fixed
ExAthena.Providers.ReqLLM.build_opts/2now converts the modes' tool list (OpenAI-format maps fromTools.describe_for_provider/1) into%ReqLLM.Tool{}structs before forwarding to req_llm. Without this, req_llm 1.10's openai adapter raisedno function clause matching in ReqLLM.Tool.to_schema/2while building the streaming request —to_schema/2only matches%ReqLLM.Tool{}. The conversion sets a stub callback because ex_athena executes tools server-side via the loop, not via req_llm's client-side dispatch. Already-formed%ReqLLM.Tool{}structs and string-keyed maps are passed through unchanged for forward compatibility.
v0.4.2 — Ollama tagged model spec follow-up
Fixed
ExAthena.Providers.ReqLLM.resolve_model/2no longer skips the provider-tag prefix when the model id contains a colon. Ollama model ids legitimately use:as the version separator (qwen2.5-coder:14b,qwen3-coder:30b), so the previous "model contains:⇒ already tagged" heuristic shipped bare names like"qwen2.5-coder:14b"to reqllm, which then split on the first colon and tried to validate"qwen2.5-coder"as a provider name — failing the `^[a-z0-9-]+$regex (rejects.) with. Now always prepends the tag unless the model already starts with"<tag>:"` (caller passed a fully-formed spec).
v0.4.1 — Ollama via OpenAI-compatible adapter
Fixed
provider: :ollamanow talks to local Ollama throughreq_llm's OpenAI adapter instead of looking up an:ollamaprovider inllm_db's catalog.llm_db2026.4.x removed first-class local-Ollama support (it only catalogues:ollama_cloudnow), so"ollama:<model>"model specs were rejected with{:error, :unknown_provider}fromLLMDB.Spec. The fix routes:ollama(and:llamacpp) through the"openai:<model>"tag and threadsopenai_compatible_backend: :ollamasoreq_llm1.10's openai adapter tolerates the missing API key on unauthenticated local deployments. Mirrors the recipe inreq_llm/guides/ollama.md.base_urlfor Ollama now auto-appends/v1when callers pass the bare host (http://localhost:11434) — req_llm's openai adapter expects the prefix to already include/v1.- A placeholder
api_key("ollama") is substituted when the:ollamabackend marker is set and no key was supplied — Ollama ignores the Authorization header butreq_llm's HTTP layer still emits one, so a non-nil value is required.
Internal
ExAthena.Config.@req_llm_provider_tag[:ollama]now resolves to"openai"(was"ollama").- New
@local_openai_compatible_backendsmap drivesopenai_compatible_backendinjection inConfig.pop_provider!/1. ExAthena.Providers.ReqLLM.build_opts/2reads the backend marker, normalises base_url, and falls back to the placeholder api_key.
v0.4.0 — operational harness (memory, skills, hooks, modes, agents, storage)
The "1.6% reasoning, 98.4% harness" upgrade. Where v0.3 perfected the loop kernel, v0.4 builds the operational harness the Claude Code paper calls out as the bulk of a production agent's value: file-based memory
- skills, a five-stage compaction pipeline with reactive recovery, a
14-event hook surface with
{:inject, msg}/{:transform, prompt}returns, two new permission modes, structured tool results, custom agent definitions with optional git-worktree isolation, and append-only session storage with file-checkpointing +/rewind.
Landed as seven tightly-scoped commits (PR0 → PR5) — each one keeps the existing test suite passing and adds focused new tests on top.
PR0 — Foundation
Added
:session_idand:parent_session_idplumbed throughLoop.State,Loop.run/2opts, the resultingToolContext, and theSessionStarthook payload. TheSessionGenServer auto-generates a stable id at start_link, reuses it on every turn, and refuses to let per-callextra_optsredirect mid-conversation. PR4 + PR5 read these.:error_prompt_too_longfinish-reason inLoop.Terminationswith category:capacity. Modes signal context-window-exceeded uniformly; PR2's reactive compaction switches on this.- Doctest in
Permissions.check/4documenting and locking the deny-first ordering (:disallowed_toolssurvives:bypass_permissions,:allowed_toolssurvives a permissive callback).
PR1 — Memory + Skills (file-based context)
Added — ExAthena.Memory
- Loads
AGENTS.md(preferred) /CLAUDE.mdfrom a 3-level hierarchy: user (~/.config/ex_athena/) → project (<cwd>/) → local override (<cwd>/AGENTS.local.md). - Each file becomes a single user-role message tagged
name: "memory"placed at the front of the conversation. The Claude Code paper notes Claude Code uses user-context (not system) for probabilistic compliance — we copy the pattern. AGENTS.mdwins overCLAUDE.mdat the same level (matches opencode).
Added — ExAthena.Skills
- Claude Code-style progressive disclosure.
SKILL.mdfiles have YAML frontmatter (name,description,disable-model-invocation,allowed-tools) plus a markdown body. The frontmatter is auto-injected into the system prompt as a## Available Skillscatalog (~50 tokens per skill); bodies stay on disk until needed. - Two activation paths: a
[skill: name]sentinel the model writes in its response, or the new:preload_skillsopt for hosts that know up-front what's needed. - Loaded from
~/.config/ex_athena/skills/<name>/SKILL.mdand<cwd>/.exathena/skills/<name>/SKILL.md. Project overrides user.
Added — Loop.run/2 options
:memory—:auto(default),false, or explicit message list.:skills—:auto(default),false, or explicit map.:preload_skills— list of skill names to activate up-front.
Changed
Compactors.Summaryextends its effective pinned-prefix bymeta[:memory_count] + meta[:preloaded_skill_count]so memory + pre-loaded skills survive every compaction cycle.
PR2 — Five-layer compaction pipeline + reactive recovery
Added — pipeline architecture
ExAthena.Compactor.Pipeline— the new default compactor. Walks a configurable list ofCompactor.Stagemodules cheapest-first, short- circuiting once the conversation falls below target. Each stage runs inside its own[:ex_athena, :compaction, <:stage_name>, :start | :stop]telemetry span.ExAthena.Compactor.Stagebehaviour withcompact_stage/2+name/0callbacks. ExistingCompactors.Summarykeeps its legacyCompactor.compact/2callback AND now implementsStagevia a thin adapter — fully backward-compatible for direct callers.
Added — five built-in stages
Compactors.BudgetReduction— replaces oversized tool-result bodies (>16k chars by default) with a[truncated; ref=<id>]pointer. Full payload moves tostate.meta[:tool_result_archive]. Pure-Elixir.Compactors.Snip— drops stale tool-result bodies older than:snip_age_iterationswhose paired assistant turn already happened, replacing each with a<snipped: stale tool-result for call …>marker.Compactors.Microcompact— collapses runs of 3+ adjacent tool-result messages into a single elided summary taggedname: "microcompact". Pure-Elixir.Compactors.ContextCollapse— non-destructive view-time projection. Detects superseded reads (file later edited) and consecutive duplicate tool calls; writes the projection tostate.meta[:compact_view]for the next request to consume. The authoritativestate.messagesis never mutated, so resume / replay / rewind (PR5) stay correct.Compactors.Summary— existing LLM summary stage, refactored.
Added — reactive recovery
- When a mode returns
{:error, :error_prompt_too_long}(PR0 finish-reason), the loop runs the pipeline withforce: trueunconditionally and retries the same iteration once. Gated by:reactive_compactopt (defaulttrue).
Configuration
:compaction_pipeline— host-overridable stage list. Default is[BudgetReduction, Snip, Microcompact, ContextCollapse, Summary].
PR3a — Hooks expansion + permission modes
Added — hook events (14 total)
Hooks.events/0exposes the catalog:SessionStart,SessionEnd,UserPromptSubmit,ChatParams,Stop,StopFailure,PreToolUse,PostToolUse,PostToolUseFailure,PermissionRequest,PermissionDenied,SubagentStart,SubagentStop,PreCompact,PreCompactStage,PostCompact,Notification.- New return values for hook callbacks:
{:inject, message_or_messages}— append context to the conversation. opencode'sexperimental.chat.system.transformpattern.{:transform, prompt}— only meaningful fromUserPromptSubmit; rewrites the user prompt before it enters the loop.
run_lifecycle_with_outputs/3returns%{halt:, injects:, transform:}for callers that need the richer outputs.run_lifecycle/3keeps its:ok | {:halt, _}shape.
Newly fired events
Stop/StopFailure/SessionEndfromto_result/2.UserPromptSubmitfrombuild_initial_state/2.ChatParamsfromModes.ReAct.iterate/1, just before each provider call.PostToolUseFailurewhen a tool returns{:error, _}.PermissionDeniedwhenever the gate decides{:deny, _}.SubagentStart/SubagentStopfromTools.SpawnAgent.PreCompactStage/PostCompactfrom the compaction pipeline.
Added — permission modes
:accept_edits— auto-allow Read/Glob/Grep/WebFetch + Edit/Write/TodoWrite- plan_mode/spawn_agent. Bash + custom tools still consult
can_use_tool.
- plan_mode/spawn_agent. Bash + custom tools still consult
:trusted— skip thecan_use_toolcallback for every tool. Still respects the denylist by default; passrespect_denylist: falseto disable that. The:autoname is reserved for the future ML safety classifier.
:bypass_permissions continues to respect the denylist (deny-first
invariant from PR0's doctest is preserved).
PR3b — Tool-result split (LLM content + UI payload) ⚠️ Breaking
Tools may now return a 3-tuple {:ok, llm, ui} in addition to the
existing {:ok, text}. The llm is the LLM-facing string the model
sees on the next iteration; ui is a %{kind:, payload:} map hosts
(TUIs, Phoenix LiveView frontends) can render as rich content
(diffs, file previews, process output, match lists) without parsing
the text. This is the Pi-style split adapted to Elixir's pattern-match
idiom.
Added
Messages.ToolResultgrowsui_payload :: %{kind:, payload:} | nil.Loop.Eventsadds{:tool_ui, %{tool_call_id:, kind:, payload:}}.- New event emitted after
:tool_resultfor any tool result carrying a payload.
Built-in payload shapes
Read→:file{ path, content, line_range }Edit→:diff{ path, before, after, replacements }Bash→:process{ command, exit_code, stdout, duration_ms }Glob→:matches{ pattern, count, items }Grep→:matches{ pattern, count, items }WebFetch→:webpage{ url, status, truncated? }Write,TodoWrite,PlanMode— text-only, unchanged.SpawnAgent(PR4) →:subagent{ iterations, cost_usd, isolation, … }
Breaking change — direct tool callers
The 6 builtins listed above (Read, Edit, Bash, Glob, Grep,
WebFetch) now return {:ok, text, ui} 3-tuples instead of the
{:ok, text} 2-tuple. Callers using these tools through the loop are
unaffected — Result.text still surfaces the LLM-facing string. Code
that calls these tools' execute/2 directly needs to update its
pattern matches. The {:ok, text} 2-tuple remains a fully supported
return shape for custom and third-party tools.
PR4 — Subagents v2 (Agents.md + worktrees + sidechains)
Added — ExAthena.Agents
- File-based agent definitions in markdown + YAML frontmatter, loaded
from a 3-level hierarchy (builtin → user → project). Frontmatter
fields:
name,description,model,provider,tools,permissions,mode,isolation. Body becomes a system-prompt addendum. - Builtin definitions shipped in
priv/agents/:general— full-tool default (matches the prior SpawnAgent behaviour).explore— read-only fast investigation.plan— analysis only with writes restricted to.exathena/plans/.
Agents.apply_to_opts/2merges definition fields into spawn opts.
Added — worktree isolation
ExAthena.Agents.Worktree.resolve/3runs three safety checks before creating a git worktree (git on PATH, cwd inside repo, clean tree). If any check fails, the subagent transparently falls back to:in_process.- Worktrees live under
~/.cache/ex_athena/worktrees/<sess>/<name>-<n>, branched fromHEAD. After the subagent finishes:- Changes left → worktree is kept; path + branch surface in the spawn
result's
ui_payloadfor review/merge. - Clean →
git worktree remove --forcecleans up.
- Changes left → worktree is kept; path + branch surface in the spawn
result's
ExAthena.Agents.WorktreeSweeperis a one-shot at boot under the application supervisor that runsgit worktree pruneand removes cache entries older than 7 days.- All internal git invocations bypass the parent's permission gate via
System.cmd/3directly — otherwise a parent in:planmode could never spawn a worktree-isolated subagent.
Added — sidechain transcripts
ExAthena.Agents.Sidechain.write/1persists each subagent's full transcript to<cwd>/.exathena/sessions/<parent_session_id>/sidechains/<subagent_id>.jsonl. Parent only sees the subagent's finaltext; the full conversation lives here.
SpawnAgent updates
- New
agent: "<name>"arg resolves a named definition and applies its fields to the sub-loop opts. SubagentStartpayload now includes:agentand:isolation.SubagentStoppayload includes the finalized isolation state.- Spawn returns the
{:ok, llm, ui}3-tuple from PR3b with a:subagentUI payload carrying iterations / tool_calls_made / cost_usd / duration_ms / isolation.
PR5 — Append-only session storage + checkpointing + rewind
Added — ExAthena.Sessions.Store
- Behaviour for append-only event storage with
append/2,read/1,list/0,tail/2. Each event carries an ISO 8601 timestamp + uuid. Sessions.Stores.InMemory— ETS-backed default. The application supervisor keeps a single named GenServer alive so the table is shared across the BEAM.Sessions.Stores.Jsonl— ETS-buffered, periodic flush (default 250ms). Hot-path appends never block on I/O. Files at<root>/<session_id>.jsonl. Synchronousflush/1for tests + clean shutdown.
Session integration
Session.start_link/1accepts:storeopt::in_memory(default),:jsonl, or a custom module.- On every
send_message/2: emits:user_message, then walksresult.messagesafter the loop and emits:assistant_message/:tool_resultfor new entries. Session.resume/2reads events back, filters to user/assistant messages, and returns the reconstructed message list. Permissions deliberately don't survive resume (Claude Code's pattern: trust is re-established per session).
Added — ExAthena.Checkpoint
- File-history backups before each
Tools.Edit/Tools.Writeat<cwd>/.exathena/file-history/<session_id>/<sha>/<version>.bin. SHA-256 of the absolute path; versions are 0-indexed and idempotent. Tombstones (<v>.tombstone) mark "this file didn't exist at checkpoint time". Checkpoint.rewind/3modes::code_and_history— restore each file to its version-0 snapshot AND truncate the JSONL session log to the chosento_uuid.:history_only— only truncate the JSONL.
ExAthena.Checkpoint.Sweeper— startup task that GCs file-history directories older than 30 days.
Distribution
mix.exs:filesnow includespriv/so the builtin agent definitions ship with the package on Hex.
Tests
- 248 tests + 2 doctests, 0 failures (was 147 + 0 in v0.3.1).
- Backward-compatible by design: existing v0.3 tests untouched, except
for the 6 builtin tools whose return shape changed (PR3b — tightened
to
{:ok, text, ui}).
v0.3.1 — per-token streaming in the ReAct mode
Added
Modes.ReActnow dispatches toprovider_mod.stream/3(instead ofquery/2) whenever the caller registered anon_eventcallback onLoop.run/2. Every%Streaming.Event{type: :text_delta, data: ...}produced by the provider is forwarded toon_eventin real time, so consumers (e.g. a LiveView chat UI) get character-level deltas again without having to drive streaming themselves.- When no
on_eventis set the behaviour is unchanged — the mode uses the cheaper one-shotquery/2path. - When the provider module does not implement
stream/3(it is an optional callback) the mode transparently falls back toquery/2.
Changed
- Docstring on
Modes.ReActnow reflects the stream/query dispatch.
v0.3.0 — PR 4 (observability) landed; Phase 4 closed
PR 4 — Observability
Added — OpenTelemetry GenAI semconv telemetry
ExAthena.Telemetry— emits:telemetry-library events shaped to the OpenTelemetry GenAI semantic conventions. Consumers bridge to OTel viaopentelemetry_telemetry(no direct OTel dep). Events:[:ex_athena, :loop, :start | :stop | :exception][:ex_athena, :chat, :start | :stop][:ex_athena, :tool, :start | :stop][:ex_athena, :compaction, :stop][:ex_athena, :subagent, :spawn | :stop][:ex_athena, :structured_retry]
- GenAI semconv metadata keys:
gen_ai_operation_name,gen_ai_provider_name,gen_ai_request_model,gen_ai_agent_id,gen_ai_conversation_id,gen_ai_tool_name,gen_ai_tool_call_id,gen_ai_usage_input_tokens,gen_ai_usage_output_tokens,gen_ai_response_finish_reasons. - New
:conversation_id/:agent_idopts onLoop.run/2— threaded into every emitted event's metadata so OTel traces can stitch across turns. Telemetry.span/3helper wraps arbitrary work in a start/stop pair with duration measurement + exception re-raising.
Released
- Version bump
0.3.0-dev→0.3.0. Ready for Hex publish.
v0.3.0-dev — PR 3 landed
PR 3 — Reliability + intelligence
No additional breaking changes. New capabilities layer on top of PR 2.
Added — context compaction
ExAthena.Compactor— behaviour for context-window reduction. Called by the kernel before each iteration when the token estimate crosses:compact_at(default 60% of the provider'smax_tokens). Preserves a pinned prefix (system prompt + rules) and a live suffix (recent turns) while substituting the middle with a summary.ExAthena.Compactors.Summary— default implementation. Uses the session's own provider to generate a terse summary and replaces the dropped messages with a single assistant message taggedname: "compactor_summary". Cost counts against the run's budget.- New options:
:compact_at(default 0.6),:pinned_prefix_count(default 1),:live_suffix_count(default 6),:compactor(override module). - New events:
{:compaction, metadata}fires after a successful compaction with before/after token counts and dropped count. - New termination:
:error_compaction_failedwhen compaction errors. - New hook:
:PreCompactfires with%{estimate: …}before each compaction attempt.
Added — budget accounting from provider metadata
extract_cost/1inExAthena.Modes.ReActpulls:total_cost(or:input_cost + :output_cost) from provider usage metadata and folds it into the run's Budget. req_llm'smodels.dev-backed cost data flows straight through.ExAthena.Result.cost_usdis populated when the provider reports cost;nilotherwise.:max_budget_usd(introduced as a knob in PR 2) now genuinely trips:error_max_budget_usdwhen cumulative cost crosses the cap.
Added — structured-output repair loop (instructor-style)
ExAthena.Structured.extract/2now retries on validation failure by appending the failed response + a user message carrying the validation error and re-prompting. Default:max_retries: 2.- After retries exhaust, returns
{:error, {:error_max_structured_output_retries, last_validation_error}}. - New events:
{:structured_retry, %{attempt:, error:}}fires on each retry.
Added — Plan-and-Solve mode
ExAthena.Modes.PlanAndSolve— two-phase mode. First iteration is planning-only (no tools, plain-text plan following a structured prompt). Subsequent iterations delegate toReAct.- Rationale: smaller / local models produce better tool-calling behaviour when they articulate a plan first.
Added — Reflexion mode
ExAthena.Modes.Reflexion— after each ReAct iteration, injects a short self-critique pass and adds it to the conversation history. Capped at 3 reflections (per research — beyond that, degeneration-of-thought kicks in).- Triples per-loop cost; best reserved for correctness-sensitive tasks.
Added — subagent supervision upgrade
ExAthena.Tools.SpawnAgentnow runs sub-loops underTask.Supervisor.async_nolink(supervisor nameExAthena.Tasks, registered byExAthena.Application). Sub-agent crashes no longer propagate to the parent; timeouts are enforceable.- New events:
{:subagent_spawn, %{id:, prompt:}}and{:subagent_result, %{id:, text:}}fire around sub-loop execution. - New optional arg:
timeout_ms(default 300_000). - New error subtypes from SpawnAgent:
{:sub_agent_crashed, reason},{:sub_agent_timeout, ms}.
Tests
- 140 total (up from 126 in PR 2). 14 new cover compaction
(threshold detection, middle-replacement, error surfacing), budget
caps (cost-based termination,
cost_usdaccumulation, nil fallback), structured repair loop (retry success, retry exhaustion, retry events), Plan-and-Solve (planning turn assertion, execution-phase tool use), and Reflexion (reflection cap, history injection).
PR 2 — Kernel rewrite (breaking changes)
The return type of ExAthena.Loop.run/2 is now {:ok, %Result{}}
instead of the v0.2 {:ok, map()}. Consumers pattern-matching on the
old map shape must update.
Added — pluggable Mode behaviour
ExAthena.Loop.Mode— behaviour withinit/1+iterate/1. Drives the turn-by-turn control flow. Kernel handles caps, budget, hooks, counters, events, and Result construction.ExAthena.Modes.ReAct— default mode. ReAct cycle (reason → act → observe) with parallel tool execution, mistake counter, and typed terminations.ExAthena.Modes.PlanAndSolve+ExAthena.Modes.Reflexion— stubs returning:not_implemented. Full implementations land in PR 3.ExAthena.Loop.Mode.resolve/1translates atom shortcuts (:react,:plan_and_solve,:reflexion) to modules.
Added — reliability knobs
:max_consecutive_mistakes(default 3) — trips:error_consecutive_mistakesafter N consecutive tool errors. A successful tool call resets the counter. Prevents runaway loops (Cline pattern).:max_budget_usd— trips:error_max_budget_usdwhen the budget accumulator crosses the cap. PR 3 wires cost computation from provider metadata.:tool_timeout_ms(default 60_000) — per-call timeout for parallel execution.:max_concurrency(default 4) —Task.async_streamconcurrency cap.
Added — parallel tool execution
ExAthena.Loop.Parallel— classifies a single iteration's tool calls into parallel-safe (read-only) and serial (mutating) groups. Runs mutating calls first in order, then parallel-safe calls concurrently viaTask.async_stream/3. Result order always matches input call order so the model sees aligned results.ExAthena.Tool.parallel_safe?/0— optional behaviour callback. Defaults tofalse.- Read-only builtins (
Read,Glob,Grep,WebFetch) declareparallel_safe?: true. Mutating builtins default tofalse.
Changed — event shape (breaking change)
v0.2's %ExAthena.Streaming.Event{type:, data:, index:} struct is
replaced by flat pattern-matchable tuples modelled on ash_ai's
ToolLoop.stream/2:
{:content, text}
{:tool_call, ToolCall.t()}
{:tool_result, ToolResult.t()}
{:iteration, integer()}
{:usage, usage_map}
{:error, term()}
{:done, Result.t()}Consumers subscribing via :on_event need to update their handlers.
OTel span emission in PR 4 consumes the same tuples.
Changed — error handling
Tool errors use the is_error: true tool-result convention (Cline
pattern). The model sees its mistake and self-corrects; the mistake
counter advances; a streak hits the cap.
Unknown tools + parse failures flow as error tool-results rather than
halting the run. Hook-driven halts produce :error_halted. Provider
errors produce :error_during_execution.
Tests
126 total (up from 116 in PR 1). 10 new cover Result shape, termination
subtypes, max_iterations → :error_max_turns, mistake counter + reset,
parallel tool ordering, flat event tuples, Mode resolve/1.
PR 1 — Foundation (already landed, unchanged)
PR 1 lays the foundation: canonical types, typed terminations, budget accounting, and a single req_llm-backed provider adapter that replaces the three hand-written provider modules.
Added — Result, Terminations, Budget
ExAthena.Result— canonical run outcome struct. Every run (success or error) returns a%Result{}carrying final text, message history, finish_reason, iterations, tool_calls_made, aggregated usage, cost in USD, duration, model, provider, and telemetry metadata. Replaces the loose map v0.2 returned.ExAthena.Loop.Terminations— typed finish_reason subtypes inspired by the Claude Agent SDK. Each run ends with exactly one of::stop,:error_max_turns,:error_max_budget_usd,:error_during_execution,:error_max_structured_output_retries,:error_consecutive_mistakes,:error_halted,:error_compaction_failed.Terminations.category/1classifies each as:success | :retryable | :capacity | :fatalfor retry-decision logic.ExAthena.Budget— usage + cost accumulator. Aggregates token usage across iterations, computes cost from provider metadata (req_llm + models.dev), and supports:max_budget_usdcaps.
Added — req_llm provider adapter
ExAthena.Providers.ReqLLM— single adapter that delegates toreq_llm's 18+ providers (OpenAI, Anthropic, Ollama, OpenRouter, Groq, Together, DeepInfra, Vercel, LM Studio, vLLM, llama.cpp, Mistral, Gemini, Cohere, Bedrock, …). Model names resolve through themodels.devregistry for cost + context-window metadata.ExAthena.Config.pop_provider!/1now threads areq_llm_provider_tagkey through opts so baremodel: "llama3.1"+provider: :ollamaauto-expands to the full"ollama:llama3.1"spec req_llm expects.Config.req_llm_provider_tag/1— translate an ExAthena provider atom into the req_llm"tag:model-id"prefix.
Removed — hand-written provider modules
ExAthena.Providers.OllamaExAthena.Providers.OpenAICompatibleExAthena.Providers.ClaudeAll three were direct HTTP clients (Ollama + OpenAICompatible) or SDK wrappers (Claude). req_llm does this work across more providers and maintains the catalogs. The provider atoms:ollama,:openai,:openai_compatible,:llamacpp,:claude,:anthropiccontinue to work — they now all resolve toExAthena.Providers.ReqLLM.
Added — dep
{:req_llm, "~> 1.10"}.
Breaking change — none yet (visible)
Consumer-visible API unchanged in this PR. Every existing call
(ExAthena.query/2, ExAthena.stream/3, ExAthena.Loop.run/2,
ExAthena.Session.start_link/1) works identically. The provider-module
change is internal.
Breaking API changes land in PR 2 (Kernel) alongside the new Mode behaviour and the new stream event shape.
Tests
- 116 tests passing (up from 91 baseline). 25 new covering Terminations, Result, Budget, and the req_llm adapter routing.
v0.2.0 — unreleased
Phase 2 of the agent-loop roadmap: ex_athena is now feature-complete for multi-turn tool-using work. Drop-in replacement for the Claude Code SDK.
Added — Agent loop
ExAthena.Loop— multi-turn loop. Infer → parse tool calls → permissions → PreToolUse hooks → execute → PostToolUse hooks → replay → repeat. Bounded by:max_iterations(default 25). Auto-falls-back between native and text-tagged tool-call protocols viaExAthena.ToolCalls.extract/2.ExAthena.Session— GenServer owning multi-turn conversation state. Appends to message history on every turn, resumable, supervised.ExAthena.run/2+ExAthena.extract_structured/2on the facade.
Added — Tool behaviour + builtins
ExAthena.Toolbehaviour (name,description,schema,execute).ExAthena.ToolContext—:cwd,:phase,:session_id,:tool_call_id,:assigns, plusresolve_path/2that rejects traversal + null bytes.ExAthena.Toolsregistry — resolves user tool lists and constructs the provider-facing + prompt-facing schemas.- Ten builtin tools:
Read(with line numbering + offset/limit)Glob(wildcard listing with max cap)Grep(rgwhen available, pure-Elixir fallback)Write(creates parent dirs)Edit(strict exact-string replacement, ambiguity-rejecting)Bash(port-based, configurable timeout, kills on timeout)WebFetch(http/https only, 1 MB cap)TodoWrite(validates statuses, optional notifier callback viaassigns)PlanMode(phase transition request — loop consumes the sentinel)SpawnAgent(synchronous sub-loop, inherits ctx, filters meta-tools)
Added — Permissions
ExAthena.Permissionswith three modes (:plan,:default,:bypass_permissions),allowed_tools/disallowed_toolslists, and acan_use_toolcallback for interactive approval.:planmode blocks mutation tools (write,edit,bash,todo_write) by default; read-only tools always permitted.
Added — Hooks
ExAthena.Hookslifecycle matching Claude Code's shape:PreToolUse,PostToolUse,Stop,Notification,PreCompact,SessionStart,SessionEnd. Matcher groups (regex or string) select which tools fire. Hook crashes are caught and become:haltreturns.
Added — Structured extraction
ExAthena.Structured.extract/2— one-shot JSON extraction with schema validation. Uses JSON mode when the provider supports it; falls back to a fenced~~~jsonblock for providers that don't.:validatoropt for custom validation.
Test surface
- 95 tests (up from 43 in Phase 1). Coverage per tool, permission modes, hook lifecycle, loop end-to-end driven by the Mock provider, structured extraction both JSON-mode and fenced.
Phase 3 roadmap (next PR)
Start migrating udin_code off direct claude_code calls. Route ticket
work (SdkRunner, GenericRunner, Orchestrator) through ExAthena.Session
so picking :ollama in the ModelProvider UI begins actually running tasks
on Ollama.
v0.1.0 — unreleased
Initial public release. Phase 1 of the agent-loop roadmap: pure inference across any provider, with the canonical message/request/response shapes and tool-call parsing infrastructure in place for Phase 2's agent loop.
Added — Core API
ExAthena.query/2— one-shot inference.ExAthena.stream/3— streaming inference with per-event callback.ExAthena.capabilities/1— static provider-capability lookup.ExAthena.Config— tiered resolver (per-call → provider env → top-level env → default).ExAthena.Error— canonical error struct with:kindatoms (:unauthorized,:not_found,:rate_limited,:timeout,:context_length_exceeded,:bad_request,:server_error,:transport,:capability,:unknown).
Added — Canonical shapes
ExAthena.Request— normalised inference request consumed by every provider.ExAthena.Response— normalised response with:text,:tool_calls,:finish_reason,:usage,:model,:provider,:raw.ExAthena.Messages.Message/.ToolCall/.ToolResult— conversation primitives.Messages.from_map/1tolerates both atom and string keys for easy interop with provider JSON.ExAthena.Streaming.Event— canonical streaming events (:start,:text_delta,:tool_call_start,:tool_call_delta,:tool_call_end,:usage,:stop,:error).
Added — Provider contract
ExAthena.Providerbehaviour withquery/2,stream/3(optional),capabilities/0.ExAthena.Capabilitiestype declaring features a provider supports.
Added — Providers
ExAthena.Providers.Ollama— local Ollama via/api/chat(native tool-calls on supported models, SSE-style newline-delimited streaming).ExAthena.Providers.OpenAICompatible—/v1/chat/completionsfor OpenAI, OpenRouter, LM Studio, vLLM, llama.cpp server, Together, Groq, etc. SSE streaming.ExAthena.Providers.Claude— wraps theclaude_codeSDK.claude_codeis declared optional so consumers that don't use Claude aren't forced to install it. (Streaming via this provider lands in Phase 2 with sessions.)ExAthena.Providers.Mock— in-memory test double with scripted responses and event lists.
Added — Tool-call parsing
ExAthena.ToolCalls.Native— parses OpenAI-styletool_callsand Claudetool_useblocks. Tolerant of atom/string keys and JSON-string arguments.ExAthena.ToolCalls.TextTagged— parses~~~tool_callfenced blocks out of assistant prose for models without native tool-call support.ExAthena.ToolCalls.extract/2— dispatch-and-fallback between the two protocols based on provider capabilities.ExAthena.ToolCalls.augment_system_prompt/2— appends text-tagged instructions to a system prompt for non-native-capable providers.
Added — Igniter installer
mix ex_athena.install— writes sensibleconfig :ex_athenadefaults, idempotent. Picks Ollama as the default provider. Requires theigniterdep (declared optional).
Phase 2 roadmap
Still to land: ExAthena.Tool behaviour + builtins (Read, Glob, Grep, Write,
Edit, Bash, WebFetch, TodoWrite, PlanMode, SpawnAgent), ExAthena.Loop
(multi-turn agent loop), ExAthena.Session GenServer, ExAthena.Hooks
(PreToolUse/PostToolUse/Stop lifecycle), ExAthena.Permissions
(:plan / :default / :bypass + can_use_tool callback), and
ExAthena.extract_structured/2 (JSON-schema-validated output).
Phase 3+ roadmap
Migrate udin_code off the claude_code direct dep: route every call through
ExAthena.*, delete UdinCode.Claude.GenericRunner, make picking :ollama
in the ModelProvider UI actually run the whole task lifecycle on Ollama.