This document describes the internal architecture of llm_core, a provider-agnostic LLM orchestration library for Elixir.
Overview
llm_core provides shared LLM infrastructure:
┌─────────────────────────────────────────────────────────────┐
│ llm_core │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Providers │ │ Router │ │ Hindsight │ │
│ │ (ALF) │ │ (ALF) │ │ (Resilient) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Structured │ │ Config │ │ Telemetry │ │
│ │ Output │ │ (Hot) │ │ (Observable) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘ALF Pipelines
The library uses ALF (Antonmi's Flow-based Framework) for composable, observable data pipelines. ALF provides:
- Composable stages - Each transformation is isolated
- Streaming support -
stream/2for lazy evaluation - Observability - Built-in telemetry hooks
- Testability -
sync: truefor deterministic testing - Backpressure - GenStage-based flow control
Inference Pipeline
The main pipeline for sending prompts to LLM providers:
validate_request
↓
resolve_provider ←── Router (task_type → provider)
↓
check_availability ←── Provider.available?()
↓
┌─[switch]────────────────────────────────────┐
│ :streaming → stream_stage (yield chunks) │
│ :blocking → send_stage (wait for full) │
└─────────────────────────────────────────────┘
↓
normalize_response ←── Provider-specific → Response.t()
↓
[optional] extract_structured ←── Schema validation
↓
emit_telemetryRouting Pipeline
Resolves which provider and model to use for a given task:
parse_task_type
↓
load_routing_config ←── Config.Store (ETS)
↓
match_rules ←── Priority-ordered rule evaluation
↓
┌─[switch]────────────────────────────────────┐
│ {:ok, route} → resolve_agent │
│ {:error, _} → apply_fallback │
└─────────────────────────────────────────────┘
↓
build_resolved_route ←── ResolvedRoute.t()Memory Pipeline (Hindsight)
Handles semantic memory operations with resilience:
┌─[switch: operation]────────────────────────────┐
│ :retain → validate_content → buffer_write │
│ :recall → check_cache → query_or_fetch │
│ :reflect → check_cache → insight_query │
└────────────────────────────────────────────────┘
↓
circuit_breaker_gate ←── Allow/Block based on health
↓
┌─[composer: retry_state]────────────────────────┐
│ attempt_operation │
│ on_failure → exponential_backoff → retry │
│ on_success → update_cache → return │
└────────────────────────────────────────────────┘
↓
normalize_resultProvider System
Behaviour Contract
All providers implement LlmCore.LLM.Provider:
@callback send(prompt(), opts()) :: {:ok, Response.t()} | {:error, Error.t()}
@callback stream(prompt(), opts()) :: {:ok, Enumerable.t()} | {:error, Error.t()}
@callback available?() :: boolean()
@callback capabilities() :: capabilities()
@callback provider_type() :: :local | :api | :cliSupported Providers
| Provider | Type | Module |
|---|---|---|
| Anthropic | :api | LlmCore.LLM.Anthropic |
| OpenAI | :api | LlmCore.LLM.OpenAI |
| Ollama | :local | LlmCore.LLM.Ollama |
| Claude Code | :cli | LlmCore.LLM.ClaudeCode |
| Gemini CLI | :cli | LlmCore.LLM.GeminiCli |
Provider Registry
Providers are registered via TOML configuration. There are two kinds:
Module providers (provider_kind: :module)
module- The Elixir module implementing theProviderbehaviouraliases- Names used by routing rulesauth.api_key_env- Environment variable for API keycost_tier- Used for error suggestions and routing decisions
CLI providers (provider_kind: :cli)
type = "cli"- No module required[providers.<id>.cli]- CLI-specific configuration (binary, flags, transports)- Config-driven: new CLI providers are added via TOML, no Elixir code needed
- Built-in providers (claude_code, droid, pi_cli, kimi_cli, codex_cli, gemini_cli) work without config; TOML entries with the same ID override them
- Availability is determined by binary presence in PATH
CLI Provider Registry
LlmCore.CLIProvider.Registry provides a dedicated query surface for CLI
providers. It merges built-in definitions with TOML-configured ones (TOML wins
on conflict) and exposes:
list/0— all known CLI providers with structured metadataavailable/0— only those with binary in PATHfetch/1— by atom ID or string aliasresolve/1— returns a ready-to-use%CLIProvider{}structcapabilities/1— introspect provider capabilities
This is the recommended API for downstream apps that need to discover or select CLI providers dynamically, replacing hard-coded provider lists.
Structured Output
LlmCore.Structured provides lightweight structured data extraction from LLM
responses without heavy dependencies:
- JSON mode - For providers supporting
format: "json", extract and validate - Schema validation - Validate decoded JSON against schemas
- Custom validators - Accept pluggable validation functions
- Retry-friendly - On validation failure, retry with error feedback
Routing System
The router resolves task types to providers based on TOML configuration:
routing.default- Default route entryrouting.tasks.<task>- Task-specific overrides with mode and capability requirements- Hot-reloads when configuration files change
Hindsight Memory Integration
Hindsight is a semantic memory system accessed via MCP (Model Context Protocol). The integration includes:
- Caching - Stale-while-revalidate with configurable TTL
- Circuit breaker - Failure isolation to prevent cascade failures
- Retry with backoff - Exponential backoff for transient errors
- Write buffering - Async batched writes for performance
- Bank management - Support for multiple memory banks
Configuration Precedence
- UI runtime override (ETS, session-only)
- Project config
- Global config
- Environment variable (
HINDSIGHT_URL) - Auto-discovered endpoint
Configuration System
Multi-Level Precedence
1. Runtime overrides (ETS) ← Highest priority
2. Environment variables
3. Project config (<project>/.llm_core/)
4. Global config (~/.llm_core/)
5. Compiled defaults ← Lowest priorityHot Reload
Configuration is stored in TOML format. The LlmCore.Config.Watcher monitors
config directories for changes and triggers reload with debouncing (100ms window).
The normalized snapshot lives in LlmCore.Config.Store (ETS) so the router,
provider registry, and memory pipelines react immediately.
Telemetry Events
# Provider events
[:llm_core, :provider, :send, :start]
[:llm_core, :provider, :send, :stop]
[:llm_core, :provider, :send, :exception]
[:llm_core, :provider, :stream, :start]
[:llm_core, :provider, :stream, :chunk]
[:llm_core, :provider, :stream, :stop]
# Router events
[:llm_core, :router, :resolve, :start]
[:llm_core, :router, :resolve, :stop]
[:llm_core, :router, :fallback]
# Hindsight events
[:llm_core, :hindsight, :retain]
[:llm_core, :hindsight, :recall, :start]
[:llm_core, :hindsight, :recall, :stop]
[:llm_core, :hindsight, :circuit_breaker, :state_change]
# Config events
[:llm_core, :config, :reload]
[:llm_core, :config, :watcher, :change]Testing Approach
- Unit tests - Each provider in isolation (mocked HTTP), router rule matching, config loading/merging, structured output extraction
- Integration tests - Provider → Router → Response flow, Hindsight retain/recall, config hot-reload, streaming end-to-end
- Property-based tests - Routing rule precedence, config merge behavior, response normalization across providers
- Behaviour compliance - All providers implement the behaviour correctly
Test infrastructure includes Mox for mock providers and StreamData for
property-based testing.