Contributor Testing

Copy Markdown View Source

Jidoka's test suite is designed to stay deterministic by default, with a narrow opt-in path for live provider runs. Contributors who add a feature must also extend the deterministic surface (unit, runtime, golden, and integration tests) and only optionally extend the live tests. This guide documents the test layout, the fake-LLM and local-operation patterns, the golden-file contract, and the mix quality gates that every change must clear. It is written for people contributing to the jidoka package, not for application authors.

When To Use This

  • Use this guide before writing a new test in test/. The folder structure and helper modules are not obvious from the file tree alone.
  • Use this guide when adding a feature that should appear in golden coverage (any change to Agent.Spec, Turn.Plan, or the import path).
  • Use this guide when changing default mix quality gates or when adding a new opt-in test category.
  • Do not use this guide for application-level testing. Application authors should follow Testing And Evals.

Prerequisites

  • Elixir ~> 1.18 and a checkout of the jidoka package.
  • mix deps.get has been run.
  • For the live opt-in path: one provider key in scope (OPENAI_API_KEY or ANTHROPIC_API_KEY).
mix deps.get
mix test

Quick Example

A deterministic test only needs a fake LLM function and (optionally) an injected operation capability. Both go through Jidoka.turn/3:

defmodule MyContributorTest do
  use ExUnit.Case, async: true

  import TestSupport

  defmodule TimeAgent do
    use Jidoka.Agent

    agent :contributor_time do
      model %{provider: :test, id: "deterministic"}
      instructions "Call local_time when asked for the time."
    end

    tools do
      local_operations do
        operation :local_time do
          description "Returns a fixed time for a city."
          handler fn %{"city" => city} -> {:ok, %{city: city, time: "09:30"}} end
        end
      end
    end
  end

  test "returns the canned answer with no provider key" do
    llm = operation_then_final_llm("local_time", %{"city" => "Chicago"}, "Chicago: 09:30")

    assert {:ok, %Jidoka.Turn.Result{content: "Chicago: 09:30"}} =
             Jidoka.turn(TimeAgent, "What time is it in Chicago?", llm: llm)
  end
end

No environment variable is required. The test runs in async: true because both capabilities are pure functions.

Concepts

Three ideas define the contributor test surface.

  1. Deterministic by default, live by opt-in. test/test_helper.exs excludes the :live tag by default. Live tests must be tagged @moduletag :live and run with mix test --include live.
  2. Two injection seams replace every external dependency. The llm: keyword option supplies a fake Jidoka.Runtime.Capabilities.llm_capability/0 function, and operations: supplies a fake Jidoka.Runtime.Capabilities.operation_capability/0 function. Together they make the runtime fully data-driven.
  3. Golden tests pin the public projection. Any change to a struct that escapes the package boundary must update the matching golden expectation.
                  mix test
                     │
        ╭────────────┼─────────────╮
        ▼            ▼             ▼
   unit tests   runtime tests  golden tests
  (per module) (capabilities,    (DSL->spec,
   pure data)   interpreter,      import->spec)
                runner)
        │            │             │
        ╰────────────┼─────────────╯
                     ▼
            integration tests
            (test/integration/,
            scenario-shaped)
                     │
                     ▼
            mix test --include live
            (opt-in real provider)

How To

Step 1: Pick The Right Test Folder

The repo's test layout has four buckets. New tests go in the bucket that matches their scope:

FolderScopeConventions
test/jidoka/Unit and per-module testsasync: true, one module per file, no provider keys.
test/jidoka/runtime/Runtime kernel testsExercise Capabilities, EffectInterpreter, TurnRunner, adapters. Inject fake capabilities.
test/jidoka/golden/DSL/import projection golden filesJidoka.project/1 output pinned verbatim; update in the same commit as the change.
test/integration/End-to-end scenariosMirror an author's flow (controls, memory, structured results, idempotency). Still deterministic.

Tests that need shared agents, actions, or controls go under test/support/integration/{agents,actions,controls}/. The test/support/integration/README.md file documents who lives there.

Step 2: Write A Fake LLM Capability

The shared helpers in test/support/test_support.ex cover the common shapes. The three building blocks are final_llm/2, operation_llm/2, and operation_then_final_llm/3:

def final_llm(content, opts \\ []) when is_binary(content) do
  result = Keyword.get(opts, :result)

  fn _intent, _journal ->
    {:ok, %{type: :final, content: content, result: result}}
  end
end

def operation_llm(name, arguments \\ %{}) when is_binary(name) and is_map(arguments) do
  fn _intent, _journal ->
    {:ok, %{type: :operation, name: name, arguments: arguments}}
  end
end

def operation_then_final_llm(name, arguments, content) do
  fn _intent, %Effect.Journal{} = journal ->
    case count_results(journal, :llm) do
      0 -> {:ok, %{type: :operation, name: name, arguments: arguments}}
      _count -> {:ok, %{type: :final, content: content}}
    end
  end
end

The contract:

  • The function takes (Effect.Intent.t(), Effect.Journal.t()) and returns {:ok, decision_map_or_struct} | {:error, term}.
  • The decision can be a Jidoka.Effect.LLMDecision struct or a plain map matching the JSON decision shape.
  • The function is called once per loop iteration. Use count_results/2 on the journal to branch by iteration number.

For multi-step loops, write a small reduction directly inline rather than a helper:

llm = fn _intent, %Effect.Journal{} = journal ->
  case TestSupport.count_results(journal, :llm) do
    0 -> {:ok, %{type: :operation, name: "step_a", arguments: %{}}}
    1 -> {:ok, %{type: :operation, name: "step_b", arguments: %{}}}
    2 -> {:ok, %{type: :final, content: "done"}}
  end
end

Step 3: Write A Local Operation Capability

For tests that exercise tool calls, use Jidoka.Operation.Source.Local when the agent is defined through the DSL, or Jidoka.Runtime.LocalOperations.operations/1 when you want a bare capability function:

operations =
  Jidoka.Runtime.LocalOperations.operations(%{
    "local_time" => fn %{"city" => city} -> {:ok, %{city: city, time: "09:30"}} end
  })

Jidoka.turn(MyAgent, "input", llm: llm, operations: operations)

When the test agent declares operations through DSL, prefer Jidoka.Operation.Source.Local inside the DSL itself so the spec is self-contained and golden-testable.

The handler signatures:

ArityReceivesUse when
1request.arguments (a map)The test only cares about input/output.
2(Effect.Intent.t(), Effect.Journal.t())The test needs the full intent (idempotency key, metadata) or to branch on prior results.

Step 4: Author A Golden Test

Golden tests live in test/jidoka/golden/. The canonical shape is in test/jidoka/golden/dsl_to_spec_test.exs:

defmodule Jidoka.GoldenTest.Support.MinimalAgent do
  use Jidoka.Agent

  agent :golden_minimal_agent do
    model %{provider: :test, id: "golden-minimal-model"}
  end
end

defmodule Jidoka.Golden.DslToSpecTest do
  use ExUnit.Case, async: true

  alias Jidoka.GoldenTest.Support.MinimalAgent

  test "minimal DSL agent compiles to the expected Agent.Spec projection" do
    assert Jidoka.project(MinimalAgent.spec()) == %{
             id: "golden_minimal_agent",
             instructions: Jidoka.Agent.default_instructions(),
             model: "test:golden-minimal-model",
             generation: %{params: %{temperature: 0.0, max_tokens: 500},
                           provider_options: %{},
                           extra: %{}},
             context_schema?: false,
             result: nil,
             memory: nil,
             operations: [],
             controls: %{max_turns: nil, timeout_ms: nil,
                         inputs: [], outputs: [], operations: [],
                         metadata: %{}},
             runtime_defaults: %{},
             metadata: %{...}
           }
  end
end

Three rules for golden tests:

  • Use == not =~. The whole point is to detect any drift.
  • Co-locate the support modules in the same file. Each golden test module owns its fixtures so cross-file moves are obvious.
  • Update the expected map in the same commit as the change. A green golden test after a struct change usually means you forgot to assert the new field.

Step 5: Author An Integration Test

Integration tests live in test/integration/ and mirror an author flow. They are still deterministic; they just exercise more than one module per test. Folder conventions:

Test fileScenario
controls_integration_test.exsInput/operation/output controls
harness_session_integration_test.exsJidoka.Session lifecycle
human_in_the_loop_integration_test.exsReview interrupt + resume
memory_integration_test.exsRecall/capture flow
multi_turn_integration_test.exsMultiple turns in one session
observability_integration_test.exsTrace and event emission
operation_idempotency_integration_test.exs:unsafe_once and replay
operation_source_integration_test.exsLocal/jido/mcp operation sources
structured_result_integration_test.exsTyped Turn.Result.value

Reuse the shared agents under test/support/integration/agents/ whenever a scenario fits one of them (MinimalChatAgent, AccountAgent, ControlledLookupAgent).

Step 6: Add A Live Test (Opt-In)

The live test pattern is documented in test/jidoka/live_req_llm_test.exs:

defmodule Jidoka.LiveReqLLMTest do
  use ExUnit.Case, async: false

  @moduletag :live
  @moduletag timeout: 120_000

  @live_enabled? not is_nil(System.get_env("OPENAI_API_KEY") || System.get_env("ANTHROPIC_API_KEY"))

  if @live_enabled? do
    # ... test bodies referencing real providers ...
  end
end

Three rules for live tests:

  • Always tag @moduletag :live. The test/test_helper.exs excludes :live so default mix test stays fast.
  • Guard with @live_enabled?. A live test without a key should compile but contain no test cases.
  • Set a generous @moduletag timeout. Real providers vary; 120s is the current default.

Run live tests with mix test --include live.

Step 7: Clear mix quality Before You Push

The mix quality alias (also aliased as mix q) runs the gates defined in mix.exs:

quality: [
  "format --check-formatted",
  "compile --warnings-as-errors",
  "credo",
  "dialyzer",
  "doctor --raise"
]

Each step is non-negotiable:

GateWhy
format --check-formattedKeeps diffs minimal; mix format should be run before commit.
compile --warnings-as-errorsWarnings are real bugs; treat them like failing tests.
credoStyle and idiom enforcement. Refactor; do not add # credo:disable lightly.
dialyzerCatches contract drift in the Zoi-backed structs and capability functions.
doctor --raiseDocumentation coverage. New public functions need @spec and @doc.

Run mix q after every meaningful change. Do not push a branch that fails any of these.

Common Patterns

  • Inject capabilities at the top of the test. A test that fakes the LLM inside a helper deep in the call chain is hard to follow. Keep the seam visible.
  • Branch on the journal, not on test state. count_results(journal, :llm) is the canonical way to "do this on the first call, that on the second".
  • Prefer Jidoka.project/1 over deep struct assertions. Asserting on a projection survives implementation churn that does not change the public shape.
  • Use Jidoka.Trace.timeline/1 for event assertions. It shrinks event details to the stable trace shape.
  • Group integration helpers in test/support/integration/. Per-file one-off agents accumulate noise.

Testing

The package itself is the test bed. Two cross-cutting commands matter:

# Fast, deterministic, default. Excludes :live.
mix test

# Include live tests. Requires a provider key.
mix test --include live

# Full quality bar.
mix quality

For a single contributor change, the loop is usually:

mix test path/to/test_file.exs
mix format
mix q

Troubleshooting

SymptomLikely CauseFix
Test passes locally, fails in CI with provider errorTest missed @moduletag :live and made a live callAdd the tag and rerun with mix test --include live.
mix test is slowA test forgot async: true or held a Jido agent process openMake the test async; teardown processes with start_supervised/1.
Golden test fails after a struct changeProjection driftedUpdate the expected map in the golden file in the same commit.
Fake LLM returns wrong shapeDecision map missing :typeUse one of the shared test helpers or set type: :final/:operation.
mix dialyzer complains about an opaque typeA capability function was typed too looselyAdd @spec matching Jidoka.Runtime.Capabilities.llm_capability/0 or Jidoka.Runtime.Capabilities.operation_capability/0.
mix doctor --raise fails on a new functionMissing @doc or @spec on a public functionDocument the function and add a spec. Hide internal helpers with @doc false.
credo flags Credo.Check.Refactor.PipeChainStart on a fixtureHelper builds a struct in one lineWrap in Map.new/2 or split into a named step.
mix q fails on compile --warnings-as-errors for unused aliasTest or module aliases a struct it does not useRemove the alias or suppress with _ = SomeModule.
Test agent process leaks between testsJidoka.start_agent/2 not torn downUse start_supervised!(MyApp.TimeAgent) or call Jidoka.stop_agent/2 in on_exit/1.

Reference

  • test/support/test_support.ex - shared helpers: final_llm/2, operation_llm/2, operation_then_final_llm/3, timeline/1, event_index/2, operation_control_index/2, operation_capability_index/2.
  • Jidoka.Runtime.LocalOperations - function-backed operation capability for tests.
  • Jidoka.Operation.Source.Local - DSL surface that wraps LocalOperations for self-contained test agents.
  • Jidoka.Runtime.Capabilities - typed bundle that the runner consumes; the llm_capability/0 and operation_capability/0 types are the test contract.
  • Jidoka.Effect.Journal - the journal the fake LLM inspects to branch on iteration.
  • Jidoka.Projection - target of golden assertions.
  • Jidoka.Trace - source of the timeline/1 helper used in event assertions.