5. Agent ergonomics: dry-run, self-describing help, structured errors, I/O discipline
Copy Markdown View SourceDate: 2026-05-29 Status: Accepted
Context
Pixir is built by agents as much as it is for them. The model drives tools, and agents (including a fresh one with no prior context) drive the CLI and the dev scripts. When a command's behaviour, arguments, or failure mode is opaque, the agent guesses — it runs destructive actions to "see what happens", invents flags, or pastes raw, unbounded output back into the model where it burns tokens and derails the Turn.
Two recurring failure modes motivate this decision:
- No safe preview. A tool/command with side effects can only be understood by running it. There is no way to ask "what would you do?" first.
- Ambiguous I/O. It is unclear what text is meant for the model, what is meant for the human at the terminal, and what is pure diagnostics. Everything collapses onto stdout, so the model ingests prompts/spinners/log noise and the human sees raw JSON.
This is a cross-cutting contract over the Tool behaviour (ADR 0001's Executor), the Event/Log envelope (ADR 0004), and the CLI (build step 8). It costs us nothing to adopt now and is expensive to retrofit.
Decision
Every Pixir tool, CLI command, and dev script obeys four rules.
1. Dry-runnable
Anything with side effects (writes, shell, network, file mutation) supports a dry-run that reports the planned action and the resources it would touch, performs no mutation, and returns success.
- CLI / scripts: accept
--dry-run. - LLM-facing tools: the Executor honours a
dry_runflag (per-call arg and/or a Turn-level mode). In dry-run a tool returns a structured plan (e.g.%{would: :write, path: ..., bytes: ...}) instead of acting. - Read-only tools are trivially dry-run (they have no effect to suppress).
2. Self-describing for a fresh agent
A caller with zero context can discover capabilities without reading source.
- CLI:
pixir helpandpixir <cmd> --helpprint usage, every flag, and at least one example. Subcommands are individually discoverable. - Tools:
__tool__/0is the help — a complete name, description, and JSON schema (required/optional args, types). Descriptions are written for a fresh agent. - Help text is plain and example-led; no hidden flags.
3. Structured errors
Failures are data, not just prose. Internally functions return {:error, reason};
at every boundary that an agent or the model reads, the error serialises to a stable
envelope:
%{ok: false, error: %{kind: atom_or_string, message: human_string, details: map}}kind is a small, stable, documented vocabulary (e.g. :outside_workspace,
:not_found, :invalid_args, :timeout) so callers can branch on it. The canonical
enumeration is Pixir.Tool.kind/0 — the single source of truth; adding a kind is a
deliberate change there. CLI commands exit non-zero on error and emit the same envelope
on the diagnostic channel; successful machine output stays clean.
Deliberate divergence — a nonzero shell exit is not an error. A bash command that
runs and exits nonzero (e.g. grep finding nothing → exit 1) returns a successful tool
result %{"output", "exit_code", "ok" => false}, not an error envelope — the tool did its
job, and the model needs the output and code to reason (treating a no-match as a failure
would derail Turns). So there is no :nonzero_exit kind; branch on exit_code/ok within
the result instead.
4. I/O channel discipline — be deliberate about what reaches the LLM
Three distinct channels, never conflated:
- Model channel — the
tool_resultpayload folded into History and sent to the model (ADR 0004). It is curated and token-bounded: large output is truncated with an explicit marker (e.g.…[truncated N of M lines]), never raw-dumped. No ANSI, spinners, or prompts. - Human channel (stdout) — rendered, human-facing output produced by the Renderer (build step 9) from bus events. Formatting lives here.
- Diagnostic channel (stderr / Logger) — debug and operational noise. Never seen by the model and not part of any tool result.
Input is equally explicit: a one-shot prompt comes from argv or, when piped, from
stdin; a tool never silently reads stdin. Each tool documents which streams it
consumes and produces.
Consequences
- Safer autonomy. Agents can preview destructive actions; dry-run is the natural
first probe and pairs with the deferred
:askpermission gate when it lands. - Faster cold starts. A fresh agent orients via
--help/__tool__/0instead of reading the codebase or trial-and-error. - Branchable failures + cheaper Turns. Stable
kinds let the loop react programmatically; bounded model-channel output keeps Turns from blowing the context window on log spew. - Cost / obligations.
- Every new tool and command must implement dry-run, help, the error envelope, and declare its streams — enforced in review (and ideally a shared test helper).
- The
kindvocabulary is a curated, documented set; adding one is a deliberate change. - Truncation policy (limits, marker format) is a tunable knob, decided in code alongside the Executor.
- The
Pixir.Toolbehaviour and Executor (build step 6) and the CLI (step 8) are the enforcement points; this ADR is their acceptance criteria.