The v0.1 walking skeleton is implemented and verified live end-to-end (see
AGENTS.md status). This roadmap tracks what's next: features the plan/grill consciously
deferred (A), knobs still to tune (B), hardening gaps found while building the skeleton
(C), and the open beta release gate (D). Rationale lives in docs/adr/; vocabulary in
CONTEXT.md.
A. Deferred features (from PLAN-v0.1 / the grill)
| Area | Notes | Source |
|---|---|---|
Permission :ask gate | Done (v1) — :auto default · :ask/:read_only · safe-command list. Session-scoped "remember/allowlist" deferred | ADR 0006 |
| Conversational driver | Done — Pixir.Conversation (ADR 0008): UI-agnostic multi-turn loop; CLI refactored onto it. Terminal REPL is now just an optional presenter (deferred — target UI is non-Elixir). | ADR 0008 |
| ACP transport | Done (ADR 0009) — verified live. pixir acp: Pixir.ACP.{Server,Protocol,Translate} over Conversation, an ACP agent over stdio. Drives any ACP client. T3Code is currently dogfood through a local adapter, not an upstreamed public install path. | ADR 0009 |
| T3Code dogfood adapter | Local-only adapter used to validate ACP behavior, projection issues, and UX. It is not upstreamed and is not part of the beta packaging contract. | ADR 0009, ADR 0016 |
| Provider usage / prompt-cache / WebSocket transport | Done for durable provider_usage and cache/WebSocket smokes. Pixir now treats WebSocket as the preferred Provider transport direction with HTTP/SSE fallback, while continuing to tune continuation, fallback recovery, and cache-key evidence. | ADR 0019, ADR 0020 |
req_llm / multi-provider | Anthropic, local models — a 2nd dialect behind the Provider seam | ADR 0002 (deferred) |
Skills (SKILL.md) | Done (ADR 0010) — progressive discovery, skills_list/skill_view, durable skill_activation snapshots, Provider replay, no-network smoke. | ADR 0010 |
| Subagents | Done (ADR 0011) — supervised child Sessions with explicit spawn/wait/send/close/list tools, lifecycle subagent_events, compact replay summaries, isolated workspaces, no-network stress smoke. | ADR 0011 |
| Workflows / subagent scheduler | Done (ADR 0012) for v1 structural Workflows. Pixir.Workflows validates dependency edges, runs through Subagents.Manager, fans out read-only steps, serializes overlapping writer write-sets, exposes run_workflow, and ships no-network tests/smoke. Still deferred: typed outputs, automatic merge-back from isolated writer snapshots, path-level tool-derived write-sets, and canonical workflow-level Events. | ADR 0012, docs/design/0001-subagent-scheduler-write-set-orchestration.md |
| Session Resources / Image Attachments | Initial image slice done (ADR 0021). Pixir ingests attachments and ACP resource_link blocks as durable Session Resources, projects images to the Provider when attached, and keeps later replay descriptor/digest-first unless resource_view explicitly rehydrates. Subagent inheritance and non-image Provider projection remain deferred. | ADR 0021 |
| Provider-hosted Web Search | Deterministic slice done (ADR 0022). Web Search is a Provider-hosted Responses tool, not a Pixir local Tool. Dry-run smoke and parser/request-shape tests are in place; live smoke remains opt-in. | ADR 0022 |
| Skill Context Hydration | Design accepted (ADR 0023); implementation follow-up. Hydrated Skill context should be explicit, canonical, permissioned, and late-bound, not hidden SKILL.md interpolation. | ADR 0023 |
| Subagents benchmark | Done for first verifiable suite + real-network capability matrix V0 — Pixir stress adapter covers N = 1,5,10,25,50; paired T3 harnesses observe Pixir spawn_agent/wait_agent and Codex collabAgentToolCall spawnAgent/wait; mix pixir.bench.real_subagents records the cheap provider/model capability matrix. Remaining work is the seeded fixture, benchctl, strict scoring, usage reconciliation, and T3-visible non-blocking status/result retrieval for long-lived child Sessions. | docs/benchmarks/subagents.md, docs/benchmarks/subagents-report.md, docs/benchmarks/real-network-subagents.md |
| Branching / fork | Fork a Session by replaying its Log to a chosen point; may build on Subagent parent-child metadata but remains a separate product surface. | CONTEXT |
| Web + LiveView | Web front-end — the trigger to adopt Phoenix.PubSub | PLAN |
| OAuth browser flow | The 127.0.0.1:1455 callback (device-code already shipped) | ADR 0002 (fast-follow) |
B. Open knobs (decide/tune in code)
- Default
buildsystem prompt (basic one exists) - Tool-loop iteration cap (currently 12)
bashtimeout — resolved:bash_timeout_ms(default 120s); stream-idle timeout still open- Provider retries —
max_retries(default 2, capped exponential backoff) - Device-code /
resumeUX copy - Model-channel truncation policy (currently 16 KB, ADR 0005)
- Model id — resolved:
config :pixir, :model→PIXIR_MODEL→~/.pixir/config.json→gpt-5.5
B2. Pi-inspired harness ergonomics
ADR 0017 locks the product boundary: Pixir should borrow Pi's useful minimal-core shape without moving interaction glue into the core Turn loop.
Near-term slices:
- Presenter preflight for interactive commands. Keep
/skill, prompt template expansion, model selection, and adapter UX outside the core unless they create canonical History. - Session tree projection. Expose a read-only tree/fork projection over Logs, parent-child Session ids, Subagent lifecycle events, and future branch summaries. Do not create a second message store.
- Compaction and replay repair. Initial slice done (ADR 0018). Pixir records
canonical
history_compactioncheckpoints, replays latest checkpoint plus tail, and reconciles pendingtool_callEvents before a new Turn or interrupt. The model-assisted compaction contract is in code as a short developer instruction plus strict schema, but networked/model-assisted compaction is still deferred alongside automatic thresholds and UX around compaction previews. - Installable practice boundary. Treat Skills, Workflow Templates, and PATCHMD patcher repos as the growth mechanism before considering package-catalog behavior.
- Adapter safety rails. Put T3-specific doctors, repairs, and projection checks in the T3 adapter/patcher repo, not Pixir core.
D. Open beta scope
ADR 0016 locks the first open beta as terminal/ACP-first developer preview: source install remains the baseline, Hex is not a beta prerequisite, T3Code dogfood adapter is not upstreamed, telemetry is off by default, and the release gate focuses on first-run UX, diagnostics, docs, CI/CD, and honest Subagent/T3 limitations. ADR 0025 separately allows Hex only as a CLI/ACP distribution path.
C. Hardening gaps (found while building — beyond the formal plan)
What separates the walking skeleton from a daily-usable MVP:
Reasoning items not persisted/replayed.Done (2026-05-29, ADR 0007) — verified live. The encrypted reasoning item (rs_…) is now a canonicalreasoningevent, recorded before its pairedtool_call(soseqkeepsrs_<fc_) and re-injected asinputby the Provider, dropping items captured under a different model (themodel-guard mirrors Pi'sisDifferentModel). Arrival-order capture preserves intra-turn interleaving for free. A live 2-Turn session persisted fourrs_items (withencrypted_content) and the Responses API accepted them onresume. Deferred: strict id-basedfc_/rs_pairing (Pixir sends nofc_ids; order suffices today) and anyencrypted_contentstaleness handling (no evidence it expires; resolve empirically if ever 400'd).No retry/backoff.Done —Provider.streamretries network/:rate_limited/5xx with capped exponential backoff (max_retries); terminal errors aren't retried.- Stream-idle timeout still open.
Done —bashcan hangbashruns via aPortand is killed onbash_timeout_ms(default 120s). Token refresh only stub-tested.Done (2026-05-29) — verified live. Forced a staleexpires_atand ran a real Turn:Pixir.Authrefreshed againstauth.openai.com, rotated both access and refresh tokens, and re-persisted the fresh credential (0600) before returning — closing the refresh-token-rotation hazard. A refresh failure no longer "kills the session": a rejected refresh token (4xx) is re-mapped to an actionable:not_authenticated("runpixir login"); a transient failure keeps its retryable:networkkind. Failure-path tests inauth_test.exs.MissingDone —edittool.Pixir.Tools.Edit(exact match, unique-unlessreplace_all, dry-run, atomic write).Done (2026-05-29) — verified live end-to-end onbash+resumenot yet verified live.gpt-5.3-codex-spark: a 2-Turn session (write+bash, thenresume→ edit+bash; the model correctly recalled the pre-edit value from folded History). Closing this surfaced and fixed a real bug: the Log type decoder usedString.to_existing_atom, so every coldresumecrashed (:unknown_event_typeonuser_message) because the writer's atoms weren't loaded in the fresh process — resume was in fact broken, not just unverified. Now validates againstEvent.canonical_types/0; CLI resume also foldsstart_sessioninto itswithso a bad fold prints a structured error instead of aMatchError. Regression tests inlog_test.exs.No CLI interrupt (Ctrl-C).Done — the escript traps SIGINT and routes it toSession.interrupt/1through the CLI presenter path.Done —config.jsonreads onlymodel.~/.pixir/config.jsonnow covers model, compaction, timeout, permission, and transport-related knobs.- T3-visible Subagent lifecycle UX is not yet complete. Pixir can spawn and wait on supervised child Sessions, and Workflows can run structural dependency graphs, but a T3 user should eventually be able to launch long-running Subagents, keep the root Turn non-blocking, inspect live child status on demand, and retrieve durable terminal summaries after reconnect/reload. This is distinct from the assistant-message projection fix: the projection contract is now documented in ADR 0009; lifecycle UX needs its own implementation pass and may deserve a future ADR if it becomes a release contract.
Suggested order toward an MVP
Permission— done (v1):askgate (ADR 0006)— doneedittool + retry/backoff +bashtimeoutClose live verification of— done (and fixed a resume-breaking bug; see C6)bash+resumePersist/replay reasoning items (correctness on multi-tool turns)— done (ADR 0007)Conversational driver (— done; CLI refactored onto itPixir.Conversation, ADR 0008)Live token refresh hardening— done (C4)Design the non-Elixir UI transport— done (ADR 0009): it's ACP, not a bespoke HTTP/WS tier. Any ACP client can drive Pixir over stdio; T3Code remains local dogfood.Build the ACP agent (Piece A)— done (2026-05-30) — verified live.pixir acp:Pixir.ACP.{Server,Protocol,Translate}overConversation. v1 minimal (initialize/session/new/session/prompt/session/cancel+session/update),:autoonly. Verified live by piping JSON-RPC on stdin: full handshake, a real turn that wrote a file via the tool (executed internally, reported astool_call/tool_call_update), mid-turnsession/cancel→cancelled, unknown method → -32601, malformed JSON → -32700 without crashing. The adversarial pass caught + fixed a stdout-pollution bug (the OTP-28 Logger redirect was a no-op → logs leaked to stdout, corrupting ndjson) with a regression test that exercises the real escript. 162 tests green.- T3Code dogfood adapter — local adapter for ACP pressure testing; not upstreamed and not packaged. (Terminal REPL stays deferred — ACP keeps it a thin optional presenter.)