All notable changes to this project will be documented in this file.
The format follows Keep a Changelog and ExAthena adheres to Semantic Versioning.
v0.4.1 — Ollama via OpenAI-compatible adapter
Fixed
provider: :ollamanow talks to local Ollama throughreq_llm's OpenAI adapter instead of looking up an:ollamaprovider inllm_db's catalog.llm_db2026.4.x removed first-class local-Ollama support (it only catalogues:ollama_cloudnow), so"ollama:<model>"model specs were rejected with{:error, :unknown_provider}fromLLMDB.Spec. The fix routes:ollama(and:llamacpp) through the"openai:<model>"tag and threadsopenai_compatible_backend: :ollamasoreq_llm1.10's openai adapter tolerates the missing API key on unauthenticated local deployments. Mirrors the recipe inreq_llm/guides/ollama.md.base_urlfor Ollama now auto-appends/v1when callers pass the bare host (http://localhost:11434) — req_llm's openai adapter expects the prefix to already include/v1.- A placeholder
api_key("ollama") is substituted when the:ollamabackend marker is set and no key was supplied — Ollama ignores the Authorization header butreq_llm's HTTP layer still emits one, so a non-nil value is required.
Internal
ExAthena.Config.@req_llm_provider_tag[:ollama]now resolves to"openai"(was"ollama").- New
@local_openai_compatible_backendsmap drivesopenai_compatible_backendinjection inConfig.pop_provider!/1. ExAthena.Providers.ReqLLM.build_opts/2reads the backend marker, normalises base_url, and falls back to the placeholder api_key.
v0.4.0 — operational harness (memory, skills, hooks, modes, agents, storage)
The "1.6% reasoning, 98.4% harness" upgrade. Where v0.3 perfected the loop kernel, v0.4 builds the operational harness the Claude Code paper calls out as the bulk of a production agent's value: file-based memory
- skills, a five-stage compaction pipeline with reactive recovery, a
14-event hook surface with
{:inject, msg}/{:transform, prompt}returns, two new permission modes, structured tool results, custom agent definitions with optional git-worktree isolation, and append-only session storage with file-checkpointing +/rewind.
Landed as seven tightly-scoped commits (PR0 → PR5) — each one keeps the existing test suite passing and adds focused new tests on top.
PR0 — Foundation
Added
:session_idand:parent_session_idplumbed throughLoop.State,Loop.run/2opts, the resultingToolContext, and theSessionStarthook payload. TheSessionGenServer auto-generates a stable id at start_link, reuses it on every turn, and refuses to let per-callextra_optsredirect mid-conversation. PR4 + PR5 read these.:error_prompt_too_longfinish-reason inLoop.Terminationswith category:capacity. Modes signal context-window-exceeded uniformly; PR2's reactive compaction switches on this.- Doctest in
Permissions.check/4documenting and locking the deny-first ordering (:disallowed_toolssurvives:bypass_permissions,:allowed_toolssurvives a permissive callback).
PR1 — Memory + Skills (file-based context)
Added — ExAthena.Memory
- Loads
AGENTS.md(preferred) /CLAUDE.mdfrom a 3-level hierarchy: user (~/.config/ex_athena/) → project (<cwd>/) → local override (<cwd>/AGENTS.local.md). - Each file becomes a single user-role message tagged
name: "memory"placed at the front of the conversation. The Claude Code paper notes Claude Code uses user-context (not system) for probabilistic compliance — we copy the pattern. AGENTS.mdwins overCLAUDE.mdat the same level (matches opencode).
Added — ExAthena.Skills
- Claude Code-style progressive disclosure.
SKILL.mdfiles have YAML frontmatter (name,description,disable-model-invocation,allowed-tools) plus a markdown body. The frontmatter is auto-injected into the system prompt as a## Available Skillscatalog (~50 tokens per skill); bodies stay on disk until needed. - Two activation paths: a
[skill: name]sentinel the model writes in its response, or the new:preload_skillsopt for hosts that know up-front what's needed. - Loaded from
~/.config/ex_athena/skills/<name>/SKILL.mdand<cwd>/.exathena/skills/<name>/SKILL.md. Project overrides user.
Added — Loop.run/2 options
:memory—:auto(default),false, or explicit message list.:skills—:auto(default),false, or explicit map.:preload_skills— list of skill names to activate up-front.
Changed
Compactors.Summaryextends its effective pinned-prefix bymeta[:memory_count] + meta[:preloaded_skill_count]so memory + pre-loaded skills survive every compaction cycle.
PR2 — Five-layer compaction pipeline + reactive recovery
Added — pipeline architecture
ExAthena.Compactor.Pipeline— the new default compactor. Walks a configurable list ofCompactor.Stagemodules cheapest-first, short- circuiting once the conversation falls below target. Each stage runs inside its own[:ex_athena, :compaction, <:stage_name>, :start | :stop]telemetry span.ExAthena.Compactor.Stagebehaviour withcompact_stage/2+name/0callbacks. ExistingCompactors.Summarykeeps its legacyCompactor.compact/2callback AND now implementsStagevia a thin adapter — fully backward-compatible for direct callers.
Added — five built-in stages
Compactors.BudgetReduction— replaces oversized tool-result bodies (>16k chars by default) with a[truncated; ref=<id>]pointer. Full payload moves tostate.meta[:tool_result_archive]. Pure-Elixir.Compactors.Snip— drops stale tool-result bodies older than:snip_age_iterationswhose paired assistant turn already happened, replacing each with a<snipped: stale tool-result for call …>marker.Compactors.Microcompact— collapses runs of 3+ adjacent tool-result messages into a single elided summary taggedname: "microcompact". Pure-Elixir.Compactors.ContextCollapse— non-destructive view-time projection. Detects superseded reads (file later edited) and consecutive duplicate tool calls; writes the projection tostate.meta[:compact_view]for the next request to consume. The authoritativestate.messagesis never mutated, so resume / replay / rewind (PR5) stay correct.Compactors.Summary— existing LLM summary stage, refactored.
Added — reactive recovery
- When a mode returns
{:error, :error_prompt_too_long}(PR0 finish-reason), the loop runs the pipeline withforce: trueunconditionally and retries the same iteration once. Gated by:reactive_compactopt (defaulttrue).
Configuration
:compaction_pipeline— host-overridable stage list. Default is[BudgetReduction, Snip, Microcompact, ContextCollapse, Summary].
PR3a — Hooks expansion + permission modes
Added — hook events (14 total)
Hooks.events/0exposes the catalog:SessionStart,SessionEnd,UserPromptSubmit,ChatParams,Stop,StopFailure,PreToolUse,PostToolUse,PostToolUseFailure,PermissionRequest,PermissionDenied,SubagentStart,SubagentStop,PreCompact,PreCompactStage,PostCompact,Notification.- New return values for hook callbacks:
{:inject, message_or_messages}— append context to the conversation. opencode'sexperimental.chat.system.transformpattern.{:transform, prompt}— only meaningful fromUserPromptSubmit; rewrites the user prompt before it enters the loop.
run_lifecycle_with_outputs/3returns%{halt:, injects:, transform:}for callers that need the richer outputs.run_lifecycle/3keeps its:ok | {:halt, _}shape.
Newly fired events
Stop/StopFailure/SessionEndfromto_result/2.UserPromptSubmitfrombuild_initial_state/2.ChatParamsfromModes.ReAct.iterate/1, just before each provider call.PostToolUseFailurewhen a tool returns{:error, _}.PermissionDeniedwhenever the gate decides{:deny, _}.SubagentStart/SubagentStopfromTools.SpawnAgent.PreCompactStage/PostCompactfrom the compaction pipeline.
Added — permission modes
:accept_edits— auto-allow Read/Glob/Grep/WebFetch + Edit/Write/TodoWrite- plan_mode/spawn_agent. Bash + custom tools still consult
can_use_tool.
- plan_mode/spawn_agent. Bash + custom tools still consult
:trusted— skip thecan_use_toolcallback for every tool. Still respects the denylist by default; passrespect_denylist: falseto disable that. The:autoname is reserved for the future ML safety classifier.
:bypass_permissions continues to respect the denylist (deny-first
invariant from PR0's doctest is preserved).
PR3b — Tool-result split (LLM content + UI payload) ⚠️ Breaking
Tools may now return a 3-tuple {:ok, llm, ui} in addition to the
existing {:ok, text}. The llm is the LLM-facing string the model
sees on the next iteration; ui is a %{kind:, payload:} map hosts
(TUIs, Phoenix LiveView frontends) can render as rich content
(diffs, file previews, process output, match lists) without parsing
the text. This is the Pi-style split adapted to Elixir's pattern-match
idiom.
Added
Messages.ToolResultgrowsui_payload :: %{kind:, payload:} | nil.Loop.Eventsadds{:tool_ui, %{tool_call_id:, kind:, payload:}}.- New event emitted after
:tool_resultfor any tool result carrying a payload.
Built-in payload shapes
Read→:file{ path, content, line_range }Edit→:diff{ path, before, after, replacements }Bash→:process{ command, exit_code, stdout, duration_ms }Glob→:matches{ pattern, count, items }Grep→:matches{ pattern, count, items }WebFetch→:webpage{ url, status, truncated? }Write,TodoWrite,PlanMode— text-only, unchanged.SpawnAgent(PR4) →:subagent{ iterations, cost_usd, isolation, … }
Breaking change — direct tool callers
The 6 builtins listed above (Read, Edit, Bash, Glob, Grep,
WebFetch) now return {:ok, text, ui} 3-tuples instead of the
{:ok, text} 2-tuple. Callers using these tools through the loop are
unaffected — Result.text still surfaces the LLM-facing string. Code
that calls these tools' execute/2 directly needs to update its
pattern matches. The {:ok, text} 2-tuple remains a fully supported
return shape for custom and third-party tools.
PR4 — Subagents v2 (Agents.md + worktrees + sidechains)
Added — ExAthena.Agents
- File-based agent definitions in markdown + YAML frontmatter, loaded
from a 3-level hierarchy (builtin → user → project). Frontmatter
fields:
name,description,model,provider,tools,permissions,mode,isolation. Body becomes a system-prompt addendum. - Builtin definitions shipped in
priv/agents/:general— full-tool default (matches the prior SpawnAgent behaviour).explore— read-only fast investigation.plan— analysis only with writes restricted to.exathena/plans/.
Agents.apply_to_opts/2merges definition fields into spawn opts.
Added — worktree isolation
ExAthena.Agents.Worktree.resolve/3runs three safety checks before creating a git worktree (git on PATH, cwd inside repo, clean tree). If any check fails, the subagent transparently falls back to:in_process.- Worktrees live under
~/.cache/ex_athena/worktrees/<sess>/<name>-<n>, branched fromHEAD. After the subagent finishes:- Changes left → worktree is kept; path + branch surface in the spawn
result's
ui_payloadfor review/merge. - Clean →
git worktree remove --forcecleans up.
- Changes left → worktree is kept; path + branch surface in the spawn
result's
ExAthena.Agents.WorktreeSweeperis a one-shot at boot under the application supervisor that runsgit worktree pruneand removes cache entries older than 7 days.- All internal git invocations bypass the parent's permission gate via
System.cmd/3directly — otherwise a parent in:planmode could never spawn a worktree-isolated subagent.
Added — sidechain transcripts
ExAthena.Agents.Sidechain.write/1persists each subagent's full transcript to<cwd>/.exathena/sessions/<parent_session_id>/sidechains/<subagent_id>.jsonl. Parent only sees the subagent's finaltext; the full conversation lives here.
SpawnAgent updates
- New
agent: "<name>"arg resolves a named definition and applies its fields to the sub-loop opts. SubagentStartpayload now includes:agentand:isolation.SubagentStoppayload includes the finalized isolation state.- Spawn returns the
{:ok, llm, ui}3-tuple from PR3b with a:subagentUI payload carrying iterations / tool_calls_made / cost_usd / duration_ms / isolation.
PR5 — Append-only session storage + checkpointing + rewind
Added — ExAthena.Sessions.Store
- Behaviour for append-only event storage with
append/2,read/1,list/0,tail/2. Each event carries an ISO 8601 timestamp + uuid. Sessions.Stores.InMemory— ETS-backed default. The application supervisor keeps a single named GenServer alive so the table is shared across the BEAM.Sessions.Stores.Jsonl— ETS-buffered, periodic flush (default 250ms). Hot-path appends never block on I/O. Files at<root>/<session_id>.jsonl. Synchronousflush/1for tests + clean shutdown.
Session integration
Session.start_link/1accepts:storeopt::in_memory(default),:jsonl, or a custom module.- On every
send_message/2: emits:user_message, then walksresult.messagesafter the loop and emits:assistant_message/:tool_resultfor new entries. Session.resume/2reads events back, filters to user/assistant messages, and returns the reconstructed message list. Permissions deliberately don't survive resume (Claude Code's pattern: trust is re-established per session).
Added — ExAthena.Checkpoint
- File-history backups before each
Tools.Edit/Tools.Writeat<cwd>/.exathena/file-history/<session_id>/<sha>/<version>.bin. SHA-256 of the absolute path; versions are 0-indexed and idempotent. Tombstones (<v>.tombstone) mark "this file didn't exist at checkpoint time". Checkpoint.rewind/3modes::code_and_history— restore each file to its version-0 snapshot AND truncate the JSONL session log to the chosento_uuid.:history_only— only truncate the JSONL.
ExAthena.Checkpoint.Sweeper— startup task that GCs file-history directories older than 30 days.
Distribution
mix.exs:filesnow includespriv/so the builtin agent definitions ship with the package on Hex.
Tests
- 248 tests + 2 doctests, 0 failures (was 147 + 0 in v0.3.1).
- Backward-compatible by design: existing v0.3 tests untouched, except
for the 6 builtin tools whose return shape changed (PR3b — tightened
to
{:ok, text, ui}).
v0.3.1 — per-token streaming in the ReAct mode
Added
Modes.ReActnow dispatches toprovider_mod.stream/3(instead ofquery/2) whenever the caller registered anon_eventcallback onLoop.run/2. Every%Streaming.Event{type: :text_delta, data: ...}produced by the provider is forwarded toon_eventin real time, so consumers (e.g. a LiveView chat UI) get character-level deltas again without having to drive streaming themselves.- When no
on_eventis set the behaviour is unchanged — the mode uses the cheaper one-shotquery/2path. - When the provider module does not implement
stream/3(it is an optional callback) the mode transparently falls back toquery/2.
Changed
- Docstring on
Modes.ReActnow reflects the stream/query dispatch.
v0.3.0 — PR 4 (observability) landed; Phase 4 closed
PR 4 — Observability
Added — OpenTelemetry GenAI semconv telemetry
ExAthena.Telemetry— emits:telemetry-library events shaped to the OpenTelemetry GenAI semantic conventions. Consumers bridge to OTel viaopentelemetry_telemetry(no direct OTel dep). Events:[:ex_athena, :loop, :start | :stop | :exception][:ex_athena, :chat, :start | :stop][:ex_athena, :tool, :start | :stop][:ex_athena, :compaction, :stop][:ex_athena, :subagent, :spawn | :stop][:ex_athena, :structured_retry]
- GenAI semconv metadata keys:
gen_ai_operation_name,gen_ai_provider_name,gen_ai_request_model,gen_ai_agent_id,gen_ai_conversation_id,gen_ai_tool_name,gen_ai_tool_call_id,gen_ai_usage_input_tokens,gen_ai_usage_output_tokens,gen_ai_response_finish_reasons. - New
:conversation_id/:agent_idopts onLoop.run/2— threaded into every emitted event's metadata so OTel traces can stitch across turns. Telemetry.span/3helper wraps arbitrary work in a start/stop pair with duration measurement + exception re-raising.
Released
- Version bump
0.3.0-dev→0.3.0. Ready for Hex publish.
v0.3.0-dev — PR 3 landed
PR 3 — Reliability + intelligence
No additional breaking changes. New capabilities layer on top of PR 2.
Added — context compaction
ExAthena.Compactor— behaviour for context-window reduction. Called by the kernel before each iteration when the token estimate crosses:compact_at(default 60% of the provider'smax_tokens). Preserves a pinned prefix (system prompt + rules) and a live suffix (recent turns) while substituting the middle with a summary.ExAthena.Compactors.Summary— default implementation. Uses the session's own provider to generate a terse summary and replaces the dropped messages with a single assistant message taggedname: "compactor_summary". Cost counts against the run's budget.- New options:
:compact_at(default 0.6),:pinned_prefix_count(default 1),:live_suffix_count(default 6),:compactor(override module). - New events:
{:compaction, metadata}fires after a successful compaction with before/after token counts and dropped count. - New termination:
:error_compaction_failedwhen compaction errors. - New hook:
:PreCompactfires with%{estimate: …}before each compaction attempt.
Added — budget accounting from provider metadata
extract_cost/1inExAthena.Modes.ReActpulls:total_cost(or:input_cost + :output_cost) from provider usage metadata and folds it into the run's Budget. req_llm'smodels.dev-backed cost data flows straight through.ExAthena.Result.cost_usdis populated when the provider reports cost;nilotherwise.:max_budget_usd(introduced as a knob in PR 2) now genuinely trips:error_max_budget_usdwhen cumulative cost crosses the cap.
Added — structured-output repair loop (instructor-style)
ExAthena.Structured.extract/2now retries on validation failure by appending the failed response + a user message carrying the validation error and re-prompting. Default:max_retries: 2.- After retries exhaust, returns
{:error, {:error_max_structured_output_retries, last_validation_error}}. - New events:
{:structured_retry, %{attempt:, error:}}fires on each retry.
Added — Plan-and-Solve mode
ExAthena.Modes.PlanAndSolve— two-phase mode. First iteration is planning-only (no tools, plain-text plan following a structured prompt). Subsequent iterations delegate toReAct.- Rationale: smaller / local models produce better tool-calling behaviour when they articulate a plan first.
Added — Reflexion mode
ExAthena.Modes.Reflexion— after each ReAct iteration, injects a short self-critique pass and adds it to the conversation history. Capped at 3 reflections (per research — beyond that, degeneration-of-thought kicks in).- Triples per-loop cost; best reserved for correctness-sensitive tasks.
Added — subagent supervision upgrade
ExAthena.Tools.SpawnAgentnow runs sub-loops underTask.Supervisor.async_nolink(supervisor nameExAthena.Tasks, registered byExAthena.Application). Sub-agent crashes no longer propagate to the parent; timeouts are enforceable.- New events:
{:subagent_spawn, %{id:, prompt:}}and{:subagent_result, %{id:, text:}}fire around sub-loop execution. - New optional arg:
timeout_ms(default 300_000). - New error subtypes from SpawnAgent:
{:sub_agent_crashed, reason},{:sub_agent_timeout, ms}.
Tests
- 140 total (up from 126 in PR 2). 14 new cover compaction
(threshold detection, middle-replacement, error surfacing), budget
caps (cost-based termination,
cost_usdaccumulation, nil fallback), structured repair loop (retry success, retry exhaustion, retry events), Plan-and-Solve (planning turn assertion, execution-phase tool use), and Reflexion (reflection cap, history injection).
PR 2 — Kernel rewrite (breaking changes)
The return type of ExAthena.Loop.run/2 is now {:ok, %Result{}}
instead of the v0.2 {:ok, map()}. Consumers pattern-matching on the
old map shape must update.
Added — pluggable Mode behaviour
ExAthena.Loop.Mode— behaviour withinit/1+iterate/1. Drives the turn-by-turn control flow. Kernel handles caps, budget, hooks, counters, events, and Result construction.ExAthena.Modes.ReAct— default mode. ReAct cycle (reason → act → observe) with parallel tool execution, mistake counter, and typed terminations.ExAthena.Modes.PlanAndSolve+ExAthena.Modes.Reflexion— stubs returning:not_implemented. Full implementations land in PR 3.ExAthena.Loop.Mode.resolve/1translates atom shortcuts (:react,:plan_and_solve,:reflexion) to modules.
Added — reliability knobs
:max_consecutive_mistakes(default 3) — trips:error_consecutive_mistakesafter N consecutive tool errors. A successful tool call resets the counter. Prevents runaway loops (Cline pattern).:max_budget_usd— trips:error_max_budget_usdwhen the budget accumulator crosses the cap. PR 3 wires cost computation from provider metadata.:tool_timeout_ms(default 60_000) — per-call timeout for parallel execution.:max_concurrency(default 4) —Task.async_streamconcurrency cap.
Added — parallel tool execution
ExAthena.Loop.Parallel— classifies a single iteration's tool calls into parallel-safe (read-only) and serial (mutating) groups. Runs mutating calls first in order, then parallel-safe calls concurrently viaTask.async_stream/3. Result order always matches input call order so the model sees aligned results.ExAthena.Tool.parallel_safe?/0— optional behaviour callback. Defaults tofalse.- Read-only builtins (
Read,Glob,Grep,WebFetch) declareparallel_safe?: true. Mutating builtins default tofalse.
Changed — event shape (breaking change)
v0.2's %ExAthena.Streaming.Event{type:, data:, index:} struct is
replaced by flat pattern-matchable tuples modelled on ash_ai's
ToolLoop.stream/2:
{:content, text}
{:tool_call, ToolCall.t()}
{:tool_result, ToolResult.t()}
{:iteration, integer()}
{:usage, usage_map}
{:error, term()}
{:done, Result.t()}Consumers subscribing via :on_event need to update their handlers.
OTel span emission in PR 4 consumes the same tuples.
Changed — error handling
Tool errors use the is_error: true tool-result convention (Cline
pattern). The model sees its mistake and self-corrects; the mistake
counter advances; a streak hits the cap.
Unknown tools + parse failures flow as error tool-results rather than
halting the run. Hook-driven halts produce :error_halted. Provider
errors produce :error_during_execution.
Tests
126 total (up from 116 in PR 1). 10 new cover Result shape, termination
subtypes, max_iterations → :error_max_turns, mistake counter + reset,
parallel tool ordering, flat event tuples, Mode resolve/1.
PR 1 — Foundation (already landed, unchanged)
PR 1 lays the foundation: canonical types, typed terminations, budget accounting, and a single req_llm-backed provider adapter that replaces the three hand-written provider modules.
Added — Result, Terminations, Budget
ExAthena.Result— canonical run outcome struct. Every run (success or error) returns a%Result{}carrying final text, message history, finish_reason, iterations, tool_calls_made, aggregated usage, cost in USD, duration, model, provider, and telemetry metadata. Replaces the loose map v0.2 returned.ExAthena.Loop.Terminations— typed finish_reason subtypes inspired by the Claude Agent SDK. Each run ends with exactly one of::stop,:error_max_turns,:error_max_budget_usd,:error_during_execution,:error_max_structured_output_retries,:error_consecutive_mistakes,:error_halted,:error_compaction_failed.Terminations.category/1classifies each as:success | :retryable | :capacity | :fatalfor retry-decision logic.ExAthena.Budget— usage + cost accumulator. Aggregates token usage across iterations, computes cost from provider metadata (req_llm + models.dev), and supports:max_budget_usdcaps.
Added — req_llm provider adapter
ExAthena.Providers.ReqLLM— single adapter that delegates toreq_llm's 18+ providers (OpenAI, Anthropic, Ollama, OpenRouter, Groq, Together, DeepInfra, Vercel, LM Studio, vLLM, llama.cpp, Mistral, Gemini, Cohere, Bedrock, …). Model names resolve through themodels.devregistry for cost + context-window metadata.ExAthena.Config.pop_provider!/1now threads areq_llm_provider_tagkey through opts so baremodel: "llama3.1"+provider: :ollamaauto-expands to the full"ollama:llama3.1"spec req_llm expects.Config.req_llm_provider_tag/1— translate an ExAthena provider atom into the req_llm"tag:model-id"prefix.
Removed — hand-written provider modules
ExAthena.Providers.OllamaExAthena.Providers.OpenAICompatibleExAthena.Providers.ClaudeAll three were direct HTTP clients (Ollama + OpenAICompatible) or SDK wrappers (Claude). req_llm does this work across more providers and maintains the catalogs. The provider atoms:ollama,:openai,:openai_compatible,:llamacpp,:claude,:anthropiccontinue to work — they now all resolve toExAthena.Providers.ReqLLM.
Added — dep
{:req_llm, "~> 1.10"}.
Breaking change — none yet (visible)
Consumer-visible API unchanged in this PR. Every existing call
(ExAthena.query/2, ExAthena.stream/3, ExAthena.Loop.run/2,
ExAthena.Session.start_link/1) works identically. The provider-module
change is internal.
Breaking API changes land in PR 2 (Kernel) alongside the new Mode behaviour and the new stream event shape.
Tests
- 116 tests passing (up from 91 baseline). 25 new covering Terminations, Result, Budget, and the req_llm adapter routing.
v0.2.0 — unreleased
Phase 2 of the agent-loop roadmap: ex_athena is now feature-complete for multi-turn tool-using work. Drop-in replacement for the Claude Code SDK.
Added — Agent loop
ExAthena.Loop— multi-turn loop. Infer → parse tool calls → permissions → PreToolUse hooks → execute → PostToolUse hooks → replay → repeat. Bounded by:max_iterations(default 25). Auto-falls-back between native and text-tagged tool-call protocols viaExAthena.ToolCalls.extract/2.ExAthena.Session— GenServer owning multi-turn conversation state. Appends to message history on every turn, resumable, supervised.ExAthena.run/2+ExAthena.extract_structured/2on the facade.
Added — Tool behaviour + builtins
ExAthena.Toolbehaviour (name,description,schema,execute).ExAthena.ToolContext—:cwd,:phase,:session_id,:tool_call_id,:assigns, plusresolve_path/2that rejects traversal + null bytes.ExAthena.Toolsregistry — resolves user tool lists and constructs the provider-facing + prompt-facing schemas.- Ten builtin tools:
Read(with line numbering + offset/limit)Glob(wildcard listing with max cap)Grep(rgwhen available, pure-Elixir fallback)Write(creates parent dirs)Edit(strict exact-string replacement, ambiguity-rejecting)Bash(port-based, configurable timeout, kills on timeout)WebFetch(http/https only, 1 MB cap)TodoWrite(validates statuses, optional notifier callback viaassigns)PlanMode(phase transition request — loop consumes the sentinel)SpawnAgent(synchronous sub-loop, inherits ctx, filters meta-tools)
Added — Permissions
ExAthena.Permissionswith three modes (:plan,:default,:bypass_permissions),allowed_tools/disallowed_toolslists, and acan_use_toolcallback for interactive approval.:planmode blocks mutation tools (write,edit,bash,todo_write) by default; read-only tools always permitted.
Added — Hooks
ExAthena.Hookslifecycle matching Claude Code's shape:PreToolUse,PostToolUse,Stop,Notification,PreCompact,SessionStart,SessionEnd. Matcher groups (regex or string) select which tools fire. Hook crashes are caught and become:haltreturns.
Added — Structured extraction
ExAthena.Structured.extract/2— one-shot JSON extraction with schema validation. Uses JSON mode when the provider supports it; falls back to a fenced~~~jsonblock for providers that don't.:validatoropt for custom validation.
Test surface
- 95 tests (up from 43 in Phase 1). Coverage per tool, permission modes, hook lifecycle, loop end-to-end driven by the Mock provider, structured extraction both JSON-mode and fenced.
Phase 3 roadmap (next PR)
Start migrating udin_code off direct claude_code calls. Route ticket
work (SdkRunner, GenericRunner, Orchestrator) through ExAthena.Session
so picking :ollama in the ModelProvider UI begins actually running tasks
on Ollama.
v0.1.0 — unreleased
Initial public release. Phase 1 of the agent-loop roadmap: pure inference across any provider, with the canonical message/request/response shapes and tool-call parsing infrastructure in place for Phase 2's agent loop.
Added — Core API
ExAthena.query/2— one-shot inference.ExAthena.stream/3— streaming inference with per-event callback.ExAthena.capabilities/1— static provider-capability lookup.ExAthena.Config— tiered resolver (per-call → provider env → top-level env → default).ExAthena.Error— canonical error struct with:kindatoms (:unauthorized,:not_found,:rate_limited,:timeout,:context_length_exceeded,:bad_request,:server_error,:transport,:capability,:unknown).
Added — Canonical shapes
ExAthena.Request— normalised inference request consumed by every provider.ExAthena.Response— normalised response with:text,:tool_calls,:finish_reason,:usage,:model,:provider,:raw.ExAthena.Messages.Message/.ToolCall/.ToolResult— conversation primitives.Messages.from_map/1tolerates both atom and string keys for easy interop with provider JSON.ExAthena.Streaming.Event— canonical streaming events (:start,:text_delta,:tool_call_start,:tool_call_delta,:tool_call_end,:usage,:stop,:error).
Added — Provider contract
ExAthena.Providerbehaviour withquery/2,stream/3(optional),capabilities/0.ExAthena.Capabilitiestype declaring features a provider supports.
Added — Providers
ExAthena.Providers.Ollama— local Ollama via/api/chat(native tool-calls on supported models, SSE-style newline-delimited streaming).ExAthena.Providers.OpenAICompatible—/v1/chat/completionsfor OpenAI, OpenRouter, LM Studio, vLLM, llama.cpp server, Together, Groq, etc. SSE streaming.ExAthena.Providers.Claude— wraps theclaude_codeSDK.claude_codeis declared optional so consumers that don't use Claude aren't forced to install it. (Streaming via this provider lands in Phase 2 with sessions.)ExAthena.Providers.Mock— in-memory test double with scripted responses and event lists.
Added — Tool-call parsing
ExAthena.ToolCalls.Native— parses OpenAI-styletool_callsand Claudetool_useblocks. Tolerant of atom/string keys and JSON-string arguments.ExAthena.ToolCalls.TextTagged— parses~~~tool_callfenced blocks out of assistant prose for models without native tool-call support.ExAthena.ToolCalls.extract/2— dispatch-and-fallback between the two protocols based on provider capabilities.ExAthena.ToolCalls.augment_system_prompt/2— appends text-tagged instructions to a system prompt for non-native-capable providers.
Added — Igniter installer
mix ex_athena.install— writes sensibleconfig :ex_athenadefaults, idempotent. Picks Ollama as the default provider. Requires theigniterdep (declared optional).
Phase 2 roadmap
Still to land: ExAthena.Tool behaviour + builtins (Read, Glob, Grep, Write,
Edit, Bash, WebFetch, TodoWrite, PlanMode, SpawnAgent), ExAthena.Loop
(multi-turn agent loop), ExAthena.Session GenServer, ExAthena.Hooks
(PreToolUse/PostToolUse/Stop lifecycle), ExAthena.Permissions
(:plan / :default / :bypass + can_use_tool callback), and
ExAthena.extract_structured/2 (JSON-schema-validated output).
Phase 3+ roadmap
Migrate udin_code off the claude_code direct dep: route every call through
ExAthena.*, delete UdinCode.Claude.GenericRunner, make picking :ollama
in the ModelProvider UI actually run the whole task lifecycle on Ollama.