Jidoka's test suite is designed to stay deterministic by default, with a
narrow opt-in path for live provider runs. Contributors who add a feature
must also extend the deterministic surface (unit, runtime, golden, and
integration tests) and only optionally extend the live tests. This guide
documents the test layout, the fake-LLM and local-operation patterns, the
golden-file contract, and the mix quality gates that every change must
clear. It is written for people contributing to the jidoka package, not
for application authors.
When To Use This
- Use this guide before writing a new test in
test/. The folder structure and helper modules are not obvious from the file tree alone. - Use this guide when adding a feature that should appear in golden coverage
(any change to
Agent.Spec,Turn.Plan, or the import path). - Use this guide when changing default
mix qualitygates or when adding a new opt-in test category. - Do not use this guide for application-level testing. Application authors should follow Testing And Evals.
Prerequisites
- Elixir
~> 1.18and a checkout of thejidokapackage. mix deps.gethas been run.- For the live opt-in path: one provider key in scope
(
OPENAI_API_KEYorANTHROPIC_API_KEY).
mix deps.get
mix test
Quick Example
A deterministic test only needs a fake LLM function and (optionally) an
injected operation capability. Both go through Jidoka.turn/3:
defmodule MyContributorTest do
use ExUnit.Case, async: true
import TestSupport
defmodule TimeAgent do
use Jidoka.Agent
agent :contributor_time do
model %{provider: :test, id: "deterministic"}
instructions "Call local_time when asked for the time."
end
tools do
local_operations do
operation :local_time do
description "Returns a fixed time for a city."
handler fn %{"city" => city} -> {:ok, %{city: city, time: "09:30"}} end
end
end
end
end
test "returns the canned answer with no provider key" do
llm = operation_then_final_llm("local_time", %{"city" => "Chicago"}, "Chicago: 09:30")
assert {:ok, %Jidoka.Turn.Result{content: "Chicago: 09:30"}} =
Jidoka.turn(TimeAgent, "What time is it in Chicago?", llm: llm)
end
endNo environment variable is required. The test runs in async: true because
both capabilities are pure functions.
Concepts
Three ideas define the contributor test surface.
- Deterministic by default, live by opt-in.
test/test_helper.exsexcludes the:livetag by default. Live tests must be tagged@moduletag :liveand run withmix test --include live. - Two injection seams replace every external dependency. The
llm:keyword option supplies a fakeJidoka.Runtime.Capabilities.llm_capability/0function, andoperations:supplies a fakeJidoka.Runtime.Capabilities.operation_capability/0function. Together they make the runtime fully data-driven. - Golden tests pin the public projection. Any change to a struct that escapes the package boundary must update the matching golden expectation.
mix test
│
╭────────────┼─────────────╮
▼ ▼ ▼
unit tests runtime tests golden tests
(per module) (capabilities, (DSL->spec,
pure data) interpreter, import->spec)
runner)
│ │ │
╰────────────┼─────────────╯
▼
integration tests
(test/integration/,
scenario-shaped)
│
▼
mix test --include live
(opt-in real provider)How To
Step 1: Pick The Right Test Folder
The repo's test layout has four buckets. New tests go in the bucket that matches their scope:
| Folder | Scope | Conventions |
|---|---|---|
test/jidoka/ | Unit and per-module tests | async: true, one module per file, no provider keys. |
test/jidoka/runtime/ | Runtime kernel tests | Exercise Capabilities, EffectInterpreter, TurnRunner, adapters. Inject fake capabilities. |
test/jidoka/golden/ | DSL/import projection golden files | Jidoka.project/1 output pinned verbatim; update in the same commit as the change. |
test/integration/ | End-to-end scenarios | Mirror an author's flow (controls, memory, structured results, idempotency). Still deterministic. |
Tests that need shared agents, actions, or controls go under
test/support/integration/{agents,actions,controls}/. The
test/support/integration/README.md file documents who lives there.
Step 2: Write A Fake LLM Capability
The shared helpers in test/support/test_support.ex cover the common
shapes. The three building blocks are final_llm/2, operation_llm/2,
and operation_then_final_llm/3:
def final_llm(content, opts \\ []) when is_binary(content) do
result = Keyword.get(opts, :result)
fn _intent, _journal ->
{:ok, %{type: :final, content: content, result: result}}
end
end
def operation_llm(name, arguments \\ %{}) when is_binary(name) and is_map(arguments) do
fn _intent, _journal ->
{:ok, %{type: :operation, name: name, arguments: arguments}}
end
end
def operation_then_final_llm(name, arguments, content) do
fn _intent, %Effect.Journal{} = journal ->
case count_results(journal, :llm) do
0 -> {:ok, %{type: :operation, name: name, arguments: arguments}}
_count -> {:ok, %{type: :final, content: content}}
end
end
endThe contract:
- The function takes
(Effect.Intent.t(), Effect.Journal.t())and returns{:ok, decision_map_or_struct} | {:error, term}. - The decision can be a
Jidoka.Effect.LLMDecisionstruct or a plain map matching the JSON decision shape. - The function is called once per loop iteration. Use
count_results/2on the journal to branch by iteration number.
For multi-step loops, write a small reduction directly inline rather than a helper:
llm = fn _intent, %Effect.Journal{} = journal ->
case TestSupport.count_results(journal, :llm) do
0 -> {:ok, %{type: :operation, name: "step_a", arguments: %{}}}
1 -> {:ok, %{type: :operation, name: "step_b", arguments: %{}}}
2 -> {:ok, %{type: :final, content: "done"}}
end
endStep 3: Write A Local Operation Capability
For tests that exercise tool calls, use
Jidoka.Operation.Source.Local when the
agent is defined through the DSL, or
Jidoka.Runtime.LocalOperations.operations/1
when you want a bare capability function:
operations =
Jidoka.Runtime.LocalOperations.operations(%{
"local_time" => fn %{"city" => city} -> {:ok, %{city: city, time: "09:30"}} end
})
Jidoka.turn(MyAgent, "input", llm: llm, operations: operations)When the test agent declares operations through DSL, prefer
Jidoka.Operation.Source.Local inside the DSL itself so the spec is
self-contained and golden-testable.
The handler signatures:
| Arity | Receives | Use when |
|---|---|---|
| 1 | request.arguments (a map) | The test only cares about input/output. |
| 2 | (Effect.Intent.t(), Effect.Journal.t()) | The test needs the full intent (idempotency key, metadata) or to branch on prior results. |
Step 4: Author A Golden Test
Golden tests live in test/jidoka/golden/. The canonical shape is in
test/jidoka/golden/dsl_to_spec_test.exs:
defmodule Jidoka.GoldenTest.Support.MinimalAgent do
use Jidoka.Agent
agent :golden_minimal_agent do
model %{provider: :test, id: "golden-minimal-model"}
end
end
defmodule Jidoka.Golden.DslToSpecTest do
use ExUnit.Case, async: true
alias Jidoka.GoldenTest.Support.MinimalAgent
test "minimal DSL agent compiles to the expected Agent.Spec projection" do
assert Jidoka.project(MinimalAgent.spec()) == %{
id: "golden_minimal_agent",
instructions: Jidoka.Agent.default_instructions(),
model: "test:golden-minimal-model",
generation: %{params: %{temperature: 0.0, max_tokens: 500},
provider_options: %{},
extra: %{}},
context_schema?: false,
result: nil,
memory: nil,
operations: [],
controls: %{max_turns: nil, timeout_ms: nil,
inputs: [], outputs: [], operations: [],
metadata: %{}},
runtime_defaults: %{},
metadata: %{...}
}
end
endThree rules for golden tests:
- Use
==not=~. The whole point is to detect any drift. - Co-locate the support modules in the same file. Each golden test module owns its fixtures so cross-file moves are obvious.
- Update the expected map in the same commit as the change. A green golden test after a struct change usually means you forgot to assert the new field.
Step 5: Author An Integration Test
Integration tests live in test/integration/ and mirror an author flow.
They are still deterministic; they just exercise more than one module per
test. Folder conventions:
| Test file | Scenario |
|---|---|
controls_integration_test.exs | Input/operation/output controls |
harness_session_integration_test.exs | Jidoka.Session lifecycle |
human_in_the_loop_integration_test.exs | Review interrupt + resume |
memory_integration_test.exs | Recall/capture flow |
multi_turn_integration_test.exs | Multiple turns in one session |
observability_integration_test.exs | Trace and event emission |
operation_idempotency_integration_test.exs | :unsafe_once and replay |
operation_source_integration_test.exs | Local/jido/mcp operation sources |
structured_result_integration_test.exs | Typed Turn.Result.value |
Reuse the shared agents under test/support/integration/agents/ whenever a
scenario fits one of them (MinimalChatAgent, AccountAgent,
ControlledLookupAgent).
Step 6: Add A Live Test (Opt-In)
The live test pattern is documented in test/jidoka/live_req_llm_test.exs:
defmodule Jidoka.LiveReqLLMTest do
use ExUnit.Case, async: false
@moduletag :live
@moduletag timeout: 120_000
@live_enabled? not is_nil(System.get_env("OPENAI_API_KEY") || System.get_env("ANTHROPIC_API_KEY"))
if @live_enabled? do
# ... test bodies referencing real providers ...
end
endThree rules for live tests:
- Always tag
@moduletag :live. Thetest/test_helper.exsexcludes:liveso defaultmix teststays fast. - Guard with
@live_enabled?. A live test without a key should compile but contain no test cases. - Set a generous
@moduletag timeout. Real providers vary; 120s is the current default.
Run live tests with mix test --include live.
Step 7: Clear mix quality Before You Push
The mix quality alias (also aliased as mix q) runs the gates defined in
mix.exs:
quality: [
"format --check-formatted",
"compile --warnings-as-errors",
"credo",
"dialyzer",
"doctor --raise"
]Each step is non-negotiable:
| Gate | Why |
|---|---|
format --check-formatted | Keeps diffs minimal; mix format should be run before commit. |
compile --warnings-as-errors | Warnings are real bugs; treat them like failing tests. |
credo | Style and idiom enforcement. Refactor; do not add # credo:disable lightly. |
dialyzer | Catches contract drift in the Zoi-backed structs and capability functions. |
doctor --raise | Documentation coverage. New public functions need @spec and @doc. |
Run mix q after every meaningful change. Do not push a branch that fails
any of these.
Common Patterns
- Inject capabilities at the top of the test. A test that fakes the LLM inside a helper deep in the call chain is hard to follow. Keep the seam visible.
- Branch on the journal, not on test state.
count_results(journal, :llm)is the canonical way to "do this on the first call, that on the second". - Prefer
Jidoka.project/1over deep struct assertions. Asserting on a projection survives implementation churn that does not change the public shape. - Use
Jidoka.Trace.timeline/1for event assertions. It shrinks event details to the stable trace shape. - Group integration helpers in
test/support/integration/. Per-file one-off agents accumulate noise.
Testing
The package itself is the test bed. Two cross-cutting commands matter:
# Fast, deterministic, default. Excludes :live.
mix test
# Include live tests. Requires a provider key.
mix test --include live
# Full quality bar.
mix quality
For a single contributor change, the loop is usually:
mix test path/to/test_file.exs
mix format
mix q
Troubleshooting
| Symptom | Likely Cause | Fix |
|---|---|---|
| Test passes locally, fails in CI with provider error | Test missed @moduletag :live and made a live call | Add the tag and rerun with mix test --include live. |
mix test is slow | A test forgot async: true or held a Jido agent process open | Make the test async; teardown processes with start_supervised/1. |
| Golden test fails after a struct change | Projection drifted | Update the expected map in the golden file in the same commit. |
| Fake LLM returns wrong shape | Decision map missing :type | Use one of the shared test helpers or set type: :final/:operation. |
mix dialyzer complains about an opaque type | A capability function was typed too loosely | Add @spec matching Jidoka.Runtime.Capabilities.llm_capability/0 or Jidoka.Runtime.Capabilities.operation_capability/0. |
mix doctor --raise fails on a new function | Missing @doc or @spec on a public function | Document the function and add a spec. Hide internal helpers with @doc false. |
credo flags Credo.Check.Refactor.PipeChainStart on a fixture | Helper builds a struct in one line | Wrap in Map.new/2 or split into a named step. |
mix q fails on compile --warnings-as-errors for unused alias | Test or module aliases a struct it does not use | Remove the alias or suppress with _ = SomeModule. |
| Test agent process leaks between tests | Jidoka.start_agent/2 not torn down | Use start_supervised!(MyApp.TimeAgent) or call Jidoka.stop_agent/2 in on_exit/1. |
Reference
test/support/test_support.ex- shared helpers:final_llm/2,operation_llm/2,operation_then_final_llm/3,timeline/1,event_index/2,operation_control_index/2,operation_capability_index/2.Jidoka.Runtime.LocalOperations- function-backed operation capability for tests.Jidoka.Operation.Source.Local- DSL surface that wrapsLocalOperationsfor self-contained test agents.Jidoka.Runtime.Capabilities- typed bundle that the runner consumes; thellm_capability/0andoperation_capability/0types are the test contract.Jidoka.Effect.Journal- the journal the fake LLM inspects to branch on iteration.Jidoka.Projection- target of golden assertions.Jidoka.Trace- source of thetimeline/1helper used in event assertions.
Related Guides
- Testing And Evals - author-facing test patterns.
- Turn Runner And Effect Interpreter - how the loop the tests exercise actually runs.
- Runtime Capabilities Internals - the adapters the test capabilities mirror.
- Projection Internals - what golden tests are pinning.
- Troubleshooting - error reference for failures that surface during tests.