Pixir — Roadmap (post-v0.1)

Copy Markdown View Source

The v0.1 walking skeleton is implemented and verified live end-to-end (see AGENTS.md status). This roadmap tracks what's next: features the plan/grill consciously deferred (A), knobs still to tune (B), hardening gaps found while building the skeleton (C), and the open beta release gate (D). Rationale lives in docs/adr/; vocabulary in CONTEXT.md.

A. Deferred features (from PLAN-v0.1 / the grill)

AreaNotesSource
Permission :ask gateDone (v1):auto default · :ask/:read_only · safe-command list. Session-scoped "remember/allowlist" deferredADR 0006
Conversational driverDone — Pixir.Conversation (ADR 0008): UI-agnostic multi-turn loop; CLI refactored onto it. Terminal REPL is now just an optional presenter (deferred — target UI is non-Elixir).ADR 0008
ACP transportDone (ADR 0009) — verified live. pixir acp: Pixir.ACP.{Server,Protocol,Translate} over Conversation, an ACP agent over stdio. Drives any ACP client. T3Code is currently dogfood through a local adapter, not an upstreamed public install path.ADR 0009
T3Code dogfood adapterLocal-only adapter used to validate ACP behavior, projection issues, and UX. It is not upstreamed and is not part of the beta packaging contract.ADR 0009, ADR 0016
Provider usage / prompt-cache / WebSocket transportDone for durable provider_usage and cache/WebSocket smokes. Pixir now treats WebSocket as the preferred Provider transport direction with HTTP/SSE fallback, while continuing to tune continuation, fallback recovery, and cache-key evidence.ADR 0019, ADR 0020
req_llm / multi-providerAnthropic, local models — a 2nd dialect behind the Provider seamADR 0002 (deferred)
Skills (SKILL.md)Done (ADR 0010) — progressive discovery, skills_list/skill_view, durable skill_activation snapshots, Provider replay, no-network smoke.ADR 0010
SubagentsDone (ADR 0011) — supervised child Sessions with explicit spawn/wait/send/close/list tools, lifecycle subagent_events, compact replay summaries, isolated workspaces, no-network stress smoke.ADR 0011
Workflows / subagent schedulerDone (ADR 0012) for v1 structural Workflows. Pixir.Workflows validates dependency edges, runs through Subagents.Manager, fans out read-only steps, serializes overlapping writer write-sets, exposes run_workflow, and ships no-network tests/smoke. Still deferred: typed outputs, automatic merge-back from isolated writer snapshots, path-level tool-derived write-sets, and canonical workflow-level Events.ADR 0012, docs/design/0001-subagent-scheduler-write-set-orchestration.md
Session Resources / Image AttachmentsInitial image slice done (ADR 0021). Pixir ingests attachments and ACP resource_link blocks as durable Session Resources, projects images to the Provider when attached, and keeps later replay descriptor/digest-first unless resource_view explicitly rehydrates. Subagent inheritance and non-image Provider projection remain deferred.ADR 0021
Provider-hosted Web SearchDeterministic slice done (ADR 0022). Web Search is a Provider-hosted Responses tool, not a Pixir local Tool. Dry-run smoke and parser/request-shape tests are in place; live smoke remains opt-in.ADR 0022
Skill Context HydrationDesign accepted (ADR 0023); implementation follow-up. Hydrated Skill context should be explicit, canonical, permissioned, and late-bound, not hidden SKILL.md interpolation.ADR 0023
Subagents benchmarkDone for first verifiable suite + real-network capability matrix V0 — Pixir stress adapter covers N = 1,5,10,25,50; paired T3 harnesses observe Pixir spawn_agent/wait_agent and Codex collabAgentToolCall spawnAgent/wait; mix pixir.bench.real_subagents records the cheap provider/model capability matrix. Remaining work is the seeded fixture, benchctl, strict scoring, usage reconciliation, and T3-visible non-blocking status/result retrieval for long-lived child Sessions.docs/benchmarks/subagents.md, docs/benchmarks/subagents-report.md, docs/benchmarks/real-network-subagents.md
Branching / forkFork a Session by replaying its Log to a chosen point; may build on Subagent parent-child metadata but remains a separate product surface.CONTEXT
Web + LiveViewWeb front-end — the trigger to adopt Phoenix.PubSubPLAN
OAuth browser flowThe 127.0.0.1:1455 callback (device-code already shipped)ADR 0002 (fast-follow)

B. Open knobs (decide/tune in code)

  • Default build system prompt (basic one exists)
  • Tool-loop iteration cap (currently 12)
  • bash timeout — resolved: bash_timeout_ms (default 120s); stream-idle timeout still open
  • Provider retries — max_retries (default 2, capped exponential backoff)
  • Device-code / resume UX copy
  • Model-channel truncation policy (currently 16 KB, ADR 0005)
  • Model id — resolved: config :pixir, :modelPIXIR_MODEL~/.pixir/config.jsongpt-5.5

B2. Pi-inspired harness ergonomics

ADR 0017 locks the product boundary: Pixir should borrow Pi's useful minimal-core shape without moving interaction glue into the core Turn loop.

Near-term slices:

  1. Presenter preflight for interactive commands. Keep /skill, prompt template expansion, model selection, and adapter UX outside the core unless they create canonical History.
  2. Session tree projection. Expose a read-only tree/fork projection over Logs, parent-child Session ids, Subagent lifecycle events, and future branch summaries. Do not create a second message store.
  3. Compaction and replay repair. Initial slice done (ADR 0018). Pixir records canonical history_compaction checkpoints, replays latest checkpoint plus tail, and reconciles pending tool_call Events before a new Turn or interrupt. The model-assisted compaction contract is in code as a short developer instruction plus strict schema, but networked/model-assisted compaction is still deferred alongside automatic thresholds and UX around compaction previews.
  4. Installable practice boundary. Treat Skills, Workflow Templates, and PATCHMD patcher repos as the growth mechanism before considering package-catalog behavior.
  5. Adapter safety rails. Put T3-specific doctors, repairs, and projection checks in the T3 adapter/patcher repo, not Pixir core.

D. Open beta scope

ADR 0016 locks the first open beta as terminal/ACP-first developer preview: source install remains the baseline, Hex is not a beta prerequisite, T3Code dogfood adapter is not upstreamed, telemetry is off by default, and the release gate focuses on first-run UX, diagnostics, docs, CI/CD, and honest Subagent/T3 limitations. ADR 0025 separately allows Hex only as a CLI/ACP distribution path.

C. Hardening gaps (found while building — beyond the formal plan)

What separates the walking skeleton from a daily-usable MVP:

  1. Reasoning items not persisted/replayed. Done (2026-05-29, ADR 0007) — verified live. The encrypted reasoning item (rs_…) is now a canonical reasoning event, recorded before its paired tool_call (so seq keeps rs_<fc_) and re-injected as input by the Provider, dropping items captured under a different model (the model-guard mirrors Pi's isDifferentModel). Arrival-order capture preserves intra-turn interleaving for free. A live 2-Turn session persisted four rs_ items (with encrypted_content) and the Responses API accepted them on resume. Deferred: strict id-based fc_/rs_ pairing (Pixir sends no fc_ ids; order suffices today) and any encrypted_content staleness handling (no evidence it expires; resolve empirically if ever 400'd).
  2. No retry/backoff. DoneProvider.stream retries network/:rate_limited/5xx with capped exponential backoff (max_retries); terminal errors aren't retried.
  3. Stream-idle timeout still open. bash can hang Donebash runs via a Port and is killed on bash_timeout_ms (default 120s).
  4. Token refresh only stub-tested. Done (2026-05-29) — verified live. Forced a stale expires_at and ran a real Turn: Pixir.Auth refreshed against auth.openai.com, rotated both access and refresh tokens, and re-persisted the fresh credential (0600) before returning — closing the refresh-token-rotation hazard. A refresh failure no longer "kills the session": a rejected refresh token (4xx) is re-mapped to an actionable :not_authenticated ("run pixir login"); a transient failure keeps its retryable :network kind. Failure-path tests in auth_test.exs.
  5. Missing edit tool. DonePixir.Tools.Edit (exact match, unique-unless replace_all, dry-run, atomic write).
  6. bash + resume not yet verified live. Done (2026-05-29) — verified live end-to-end on gpt-5.3-codex-spark: a 2-Turn session (write+bash, then resume → edit+bash; the model correctly recalled the pre-edit value from folded History). Closing this surfaced and fixed a real bug: the Log type decoder used String.to_existing_atom, so every cold resume crashed (:unknown_event_type on user_message) because the writer's atoms weren't loaded in the fresh process — resume was in fact broken, not just unverified. Now validates against Event.canonical_types/0; CLI resume also folds start_session into its with so a bad fold prints a structured error instead of a MatchError. Regression tests in log_test.exs.
  7. No CLI interrupt (Ctrl-C). Done — the escript traps SIGINT and routes it to Session.interrupt/1 through the CLI presenter path.
  8. config.json reads only model. Done~/.pixir/config.json now covers model, compaction, timeout, permission, and transport-related knobs.
  9. T3-visible Subagent lifecycle UX is not yet complete. Pixir can spawn and wait on supervised child Sessions, and Workflows can run structural dependency graphs, but a T3 user should eventually be able to launch long-running Subagents, keep the root Turn non-blocking, inspect live child status on demand, and retrieve durable terminal summaries after reconnect/reload. This is distinct from the assistant-message projection fix: the projection contract is now documented in ADR 0009; lifecycle UX needs its own implementation pass and may deserve a future ADR if it becomes a release contract.

Suggested order toward an MVP

  1. Permission :ask gate (ADR 0006)done (v1)
  2. edit tool + retry/backoff + bash timeoutdone
  3. Close live verification of bash + resumedone (and fixed a resume-breaking bug; see C6)
  4. Persist/replay reasoning items (correctness on multi-tool turns)done (ADR 0007)
  5. Conversational driver (Pixir.Conversation, ADR 0008)done; CLI refactored onto it
  6. Live token refresh hardeningdone (C4)
  7. Design the non-Elixir UI transportdone (ADR 0009): it's ACP, not a bespoke HTTP/WS tier. Any ACP client can drive Pixir over stdio; T3Code remains local dogfood.
  8. Build the ACP agent (Piece A)done (2026-05-30) — verified live. pixir acp: Pixir.ACP.{Server,Protocol,Translate} over Conversation. v1 minimal (initialize/session/new/session/prompt/session/cancel + session/update), :auto only. Verified live by piping JSON-RPC on stdin: full handshake, a real turn that wrote a file via the tool (executed internally, reported as tool_call/tool_call_update), mid-turn session/cancelcancelled, unknown method → -32601, malformed JSON → -32700 without crashing. The adversarial pass caught + fixed a stdout-pollution bug (the OTP-28 Logger redirect was a no-op → logs leaked to stdout, corrupting ndjson) with a regression test that exercises the real escript. 162 tests green.
  9. T3Code dogfood adapter — local adapter for ACP pressure testing; not upstreamed and not packaged. (Terminal REPL stays deferred — ACP keeps it a thin optional presenter.)