Error Model And Recovery

Copy Markdown View Source

You need consistent error taxonomy and retry behavior across provider, tool, and validation failures.

After this guide, you can classify failures and pick the right recovery path.

Error Types

Jido.AI.Error uses Splode classes:

Recovery Strategy

  • RateLimit: retry with backoff, respect provider hints
  • Request timeout/network: retry with capped attempts
  • Auth: fail fast, rotate credentials/config
  • Validation: fail fast and return actionable messages
  • Unknown: sanitize user response, log full detail

Package Boundary

jido_ai owns the AI runtime error envelope used in signals, tool results, and telemetry-facing payloads.

Use Jido.AI.Error.normalize/4 to adapt arbitrary runtime errors into the canonical envelope, Jido.AI.Error.normalize_result/3 for result tuples, and Jido.AI.Error.retryable?/1 for retry policy decisions.

Upstream packages such as jido_action should stay generic. They can expose error type/message/details and retryability, but they should not define AI-specific contracts.

Jido.AI.Error.normalize/4 adapts upstream Jido.Action.Error, Jido.Signal.Error, and Jido.Error structs through the generic Jido.Error map contract. Plain Elixir exceptions use the caller-provided fallback type while preserving the exception message and sanitized struct fields.

At this boundary, envelope details are normalized to JSON-safe values. Raw runtime terms (for example tuples, pids, refs) are stringified so signal and telemetry payload encoding stays reliable.

Already-decoded maps with string keys are accepted when their type/code atom already exists, so JSON round-trips do not silently lose known error types or retry hints.

Example: Sanitized User Message + Full Log

err = %{file: "/srv/app/lib/secret.ex", line: 18}

%{user_message: user_message, log_message: log_message} =
  Jido.AI.Error.Sanitize.sanitize_error_for_display(err)

IO.puts(user_message)
Logger.error(log_message)

Failure Mode: Retrying Non-Retryable Errors

Symptom:

  • repeated failures with no chance of success

Fix:

  • do not retry auth/validation errors
  • only retry transient transport/provider failures
  • cap retries and emit terminal error signal

Defaults You Should Know

  • ToolExec supports explicit retry fields (max_retries, retry_backoff_ms)
  • request timeout and strategy iteration limits are separate controls

When To Use / Not Use

Use this guide when:

  • defining error handling policy in agents, directives, or actions

Do not use this guide when:

  • debugging one-off local failures without operational impact

Next