Changelog
View SourceAll notable changes to this project will be documented in this file.
[0.16.0] - 2026-05-10
A significant Gemini-on-Vertex upgrade. Most of the new surface lands as
Nous.Messages.Gemini helpers + small build_request_params/3 wiring on
both Nous.Providers.VertexAI and Nous.Providers.Gemini, so anything new
works against either entry point.
Added
- Thinking config (request-side). New
:thinking_configsetting maps togenerationConfig.thinkingConfig, letting callers setthinking_budgetandinclude_thoughtson Gemini 2.5/3.x. Both Elixir shape (%{thinking_budget: 1024, include_thoughts: true}) and native Vertex shape (%{"thinkingBudget" => 1024, "includeThoughts" => true}) are accepted. thoughtSignatureround-trip on tool calls.Nous.Messages.Gemininow preserves Vertex'sthoughtSignatureon parsed tool calls (undertool_call["metadata"]["thought_signature"]) and echoes it back when serializing assistant turns. Without this, multi-turn thinking + tool loops on Gemini 2.5/3.x degrade or fail because the next turn lacks the required signature. The streaming normalizer also propagates the signature on{:tool_call_delta, ...}events.- Structured output (JSON schema). New
:json_responseand:json_schemasettings wire toresponseMimeType/responseSchemaingenerationConfig. The cross-provider:response_formatshape (%{type: :json_schema, schema: ...}and%{type: :json_object}) maps through too. - Safety settings.
:safety_settingsflows to top-levelsafetySettings, with atom-keyed entries auto-stringified. - Tool config / tool choice.
:tool_config(raw map) and:tool_choice(friendly form) both flow to top-leveltoolConfig. Friendly forms::auto,:any/:required,:none, and{:any, ["fn_a", ...]}forallowedFunctionNames. - Function calling on Vertex/Gemini actually works. Function
declarations are now serialized in Vertex's
tools[].functionDeclarationsformat viaNous.ToolSchema.to_gemini/1(which strips OpenAI'sstrictfield and unsupportedadditionalPropertiesfrom the parameters schema). Previously the high-levelNous.LLMpath silently dropped tools for these providers. - Native Vertex tools. New
:native_toolssetting accepts:google_search,:url_context,:code_executionatoms (or{tool, config}tuples / raw maps) and adds them as additional entries in the Vertextoolsarray, alongside any function declarations. - Context caching.
:cached_contentsetting maps to top-levelcachedContent. Pass-through only — create caches via the Vertex REST API for now. - Streaming + tools.
Nous.LLM.stream_text/3now honors:tools. Tool-call deltas are aggregated per turn (preserving anythoughtSignature), tools execute between turns, and the conversation continues until the model stops calling tools or hits@max_tool_iterations. Text deltas are still yielded to the caller as they were produced. - More
generationConfigfields:topK←:top_k,seed←:seed,candidateCount←:candidate_count,presencePenalty←:presence_penalty,frequencyPenalty←:frequency_penalty,responseModalities←:response_modalities.
Changed
- Single timeout source of truth. Removed the separate
@streaming_timeoutconstants fromNous.Providers.VertexAI(300s) andNous.Providers.Gemini(120s). Streaming and non-streaming now share the same provider default; the actual timeout used at request time is alwaysmodel.receive_timeout, which flows throughbuild_provider_opts/1as:timeout. Override viaModel.parse(..., receive_timeout: ms).
[0.15.8] - 2026-05-06
Fixed
- Vertex AI / Gemini whitespace text parts no longer crash the
request pipeline. Gemini occasionally returns
textparts whose content is only newlines (e.g."\n\n\n") — typically between tool calls or as filler when the model is blocked. Ecto's default:empty_valuesforcast/3treats whitespace-only strings as empty, soNous.Message.ContentPart's changeset dropped thecontentfield entirely and then raised%Ecto.InvalidChangesetError{errors: [content: {"content is required", []}]}fromContentPart.new!/1, taking down the wholeNous.LLM.run_with_tools/6call.ContentPartnow overrides:empty_valuesto[""]so legitimate whitespace content is preserved, andNous.Messages.Gemini.parse_content/1defensively skips whitespace-only text parts to avoid creating uselessContentParts. The streaming normalizer (Nous.StreamNormalizer.Gemini) already had this guard; the non-streaming path is now consistent. Nous.Messages.Gemini.parse_content/1no longer silently drops function calls withoutargs. Nullary tool calls (%{"functionCall" => %{"name" => "get_time"}}) were falling into the catch-all clause and disappearing. Pattern now requires onlynameand falls back to%{}forargs, matching the behavior of the siblingparse_parts/1helper.
Added
Nous.Errors.RetryInfoparses server-suggested retry hints from provider error responses. Checkserror.details[]forgoogle.rpc.RetryInfo(Vertex AI / Gemini) first, then theRetry-AfterHTTP header. Returns delay in milliseconds, ornilwhen no hint is available —nilis itself meaningful for Google APIs, since long-term/daily quota exhaustion deliberately omitsRetryInfoto discourage retry loops.Nous.Errors.ProviderErrorgains:retry_after_msalongside the existing:status_code.Nous.Provider.request/3andrequest_stream/3now populate both fields automatically when the underlying HTTP layer returns an error tuple, so callers can branch on rate-limit hints without parsing provider-specific bodies:case Nous.LLM.run_with_tools(...) do {:error, %Nous.Errors.ProviderError{retry_after_ms: ms}} when is_integer(ms) -> {:snooze, ms} # use server-suggested delay {:error, %Nous.Errors.ProviderError{status_code: 429}} -> {:snooze, exp_backoff(attempt)} # rate-limited, no hint ... endGemini/Vertex
finishReasonandpromptFeedbackare surfaced.Nous.Messages.Gemini.from_response/1now stores both inmessage.metadata(when present) and emits aLogger.warningwhen the candidate produced empty content for a non-STOP reason (SAFETY,RECITATION,MAX_TOKENS, etc.) or when the prompt was blocked. Previously these signals were discarded, so blocked generations manifested as silent empty messages with no diagnostic.
Changed
- HTTP error tuples now carry response headers.
Nous.HTTP.Backend.Req,Nous.HTTP.Backend.Hackney, andNous.HTTP.StreamBackend.Reqpreviously returned{:error, %{status, body}}and dropped headers entirely, which made it impossible to readRetry-After. They now return{:error, %{status, body, headers}}withheadersas a list of{name, value}tuples (lowercased per HTTP spec, both string). Existing pattern matches on%{status: _, body: _}continue to work since map matching is non-exhaustive. - Gemini tool-call ID generation unified.
Nous.Messages.Gemini.parse_content/1previously used"gemini_#{:rand.uniform(10_000)}"(~50% birthday-paradox collision at ~118 calls) whileparse_parts/1used"call_#{:rand.uniform(1_000_000)}"— two formats, two ranges. Both now share agenerate_tool_call_id/0helper using 64 bits of:crypto.strong_rand_bytes/1, base64url-encoded with thegemini_prefix preserved.
[0.15.7] - 2026-05-05
Changed
hackneyis now an optional dependency. Req (default for both one-shot and streaming) is the primary HTTP backend;hackneyis only used when a consumer opts intoNous.HTTP.Backend.Hackney/Nous.HTTP.StreamBackend.HackneyviaNOUS_HTTP_BACKEND=hackney(or the streaming variant) or app config. Forcinghackney ~> 4.0as a hard dep (added in 0.15.x) broke downstream apps with any transitive constraint ofhackney ~> 1.20(e.g.aws ~> 1.0's optional dep), since the resolver activated the optional constraint once hackney 4 entered the graph. Apps that use the hackney backend now declare{:hackney, "~> 4.0"}in their ownmix.exs.
[0.15.6] - 2026-05-05
Fixed
- Gemini / Vertex AI multi-part responses no longer crash
Message.new!/1. When a Gemini candidate contained more than onetext(orthought) part — common on longgemini-2.5-prooutputs such as multi-thousand-token translations —from_response/1passed the raw list ofContentPartstructs toNous.Message, whose:contentfield is:string. Ecto then raised%Ecto.InvalidChangesetError{errors: [content: {"is invalid", [type: :string, validation: :cast]}]}.consolidate_content_parts/1now joins homogeneous lists of:textor:thinkingparts into a single string. Vertex AI is fixed implicitly via the existing:vertex_ai → from_gemini_response/1delegation inNous.Messages.from_provider_response/2.
[0.15.5] - 2026-05-01
Fixed
- Both Req-based HTTP backends (
Nous.HTTP.Backend.ReqandNous.HTTP.StreamBackend.Req) now actually use the configuredNous.Finchpool. Previously they ignored the:finch_nameopt built byNous.Providerand let Req spin up its own default Finch instance, leaving the supervisedNous.Finchpool (started byNous.Applicationwithsize: 10, count: 1) idle. Both backends now read:finch_namefrom per-call opts, falling back toApplication.get_env(:nous, :finch, Nous.Finch). Net effect:Nous.Finchbecomes the live default for both streaming and non-streaming on Req, so pool tuning via app config actually takes effect. (Note: Req disallows passing:finchtogether with:connect_options; connect timeouts are now pool-level — configure on theNous.Finchpool itself if a non-default is needed.)
Changed
Default timeouts increased to 3 minutes (180_000 ms) across the board. The previous 60s default routinely tripped on reasoning models and longer completions. Affected:
Nous.Modelreceive_timeoutdefault → 180_000Nous.Model.default_receive_timeout/1per-provider: cloud/custom → 180_000, llamacpp → 300_000 (up from 120_000)- Provider
@default_timeout(OpenAI, Anthropic, Mistral, VertexAI, OpenAICompatible) → 180_000 - Provider
@streaming_timeout(Anthropic, Mistral, VertexAI, OpenAICompatible) → 300_000 (up from 120_000) - HTTP backend defaults (Req + Hackney, both streaming and non-streaming) → 180_000
Per-call
:timeout/:receive_timeoutopts continue to override.
[0.15.4] - 2026-05-01
Pluggable streaming HTTP backends + hackney 4 pull-mode bug fix.
Fixed
- Hackney 4 streaming was silently in push mode, not pull mode.
lib/nous/providers/http.ex:463-470(in 0.15.0–0.15.3) passed[:async, :once, ...]as separate atoms to:hackney.request/5. Erlang'sproplistsresolves bare atom:asyncas{:async, true}, which puts hackney into push mode; the bare:onceatom is silently ignored. The architectural intent of M-12 (strict pull-based backpressure so a slow consumer cannot grow its mailbox) was forfeited —:hackney.stream_next/1is a no-op in push mode, so the receive loop appeared to work in many cases (chunks arrive in the same shape) but the pacing came from the producer, not the consumer. The fix is the tuple form[{:async, :once}, ...]perdeps/hackney/NEWS.md:269-272. Empirical confirmation: with the broken form a benign Bypass server delivers 97 messages to the caller's mailbox in 2 s without anystream_next/1call; with the tuple form the mailbox holds only 2 messages (status + headers) and body chunks gate onstream_next/1. Reported as part of the same bug that caused observable timeouts against cold/slow SSE backends.
Added
Nous.HTTP.StreamBackendbehaviour — pluggable streaming HTTP layer mirroring the non-streamingNous.HTTP.Backendintroduced in 0.15.1. Two impls ship:Nous.HTTP.StreamBackend.Req— the new default. DrivesReq.post/1with the:intocallback. Simpler stack (Req/Finch/Mint), marginally faster TTFB than hackney in benchmarks against LMStudio (~130 ms vs ~133 ms mean).Nous.HTTP.StreamBackend.Hackney— opt-in. Strict pull-based backpressure via:hackney's[{:async, :once}]mode (the bug above is fixed here). Pick this when downstream consumers can block per chunk (LiveView fan-out under load, persistence-on-every-chunk, slow IO).
:stream_backendper-call opt onNous.Providers.HTTP.stream/4.NOUS_HTTP_STREAM_BACKENDenv var (req|hackney|My.Custom.Backend). Resolution mirrorsNOUS_HTTP_BACKEND: per-call → env → app config → default.config :nous, :http_stream_backend, MyBackendapplication config knob.
Changed
Nous.Providers.HTTP.stream/4now dispatches to the configuredNous.HTTP.StreamBackendinstead of inlining hackney plumbing. The public API surface (return shape, event types, error tuples) is unchanged. Provider stream normalizers (Nous.StreamNormalizer.*) consume normalized events and need no changes.- The non-streaming pluggable
Nous.HTTP.Backendresolver is refactored to share itsString.to_existing_atom/1safety logic with the streaming resolver — same C-2 protection on both paths.
Documentation
Nous.Providers.HTTPmoduledoc rewritten around the dual pluggable-backend model and the streaming backpressure trade-off.Nous.HTTP.StreamBackendand the two impl modules carry full moduledocs explaining when to pick each.
Migration
No code changes required for callers — the default behavior is restored to "streaming works against any healthy SSE backend." Apps that depend on strict pull-based backpressure should set:
config :nous, :http_stream_backend, Nous.HTTP.StreamBackend.Hackneyor pass stream_backend: Nous.HTTP.StreamBackend.Hackney per call.
[0.15.3] - 2026-05-01
Streaming + tool execution. The Nous.Agent.run/3 loop now has a
stream: true opt that combines per-token deltas with the regular
tool-call loop. Behavior is identical to non-streaming run/3 except
for the additional streaming events: same final result, same callbacks,
same fallback chain, same hook/plugin pipeline.
Added
:streamoption onNous.Agent.run/3— runs the iteration loop with the LLM call streamed. Per-iteration assembly produces a%Nous.Message{}structurally identical to what the non-streaming path returns, so:on_llm_new_message,process_response,handle_tool_calls, and the loop continuation are all unchanged. Per-token:on_llm_new_deltafires for text and the new:on_llm_new_thinking_deltafires for reasoning. Works across all providers (OpenAI-compatible, Anthropic, Gemini, Vertex AI, Mistral) and is compatible withoutput_typefor streaming structured output.:on_llm_new_thinking_deltacallback — cleanly-separated reasoning deltas. Pre-existingNous.Agent.run_stream/3keeps emitting[thinking] …on:on_llm_new_deltafor backward compatibility — the split is opt-in viastream: true.Nous.StreamNormalizer.ToolCallAccumulator— polymorphic across the three provider chunk shapes (OpenAI list with split JSON args, Anthropic_phase-tagged fragments, Gemini already-completefunctionCall). Reassembles them into the unified%{"id", "name", "arguments" => decoded_map}shape thatNous.Messages.extract_tool_calls/1already understands.{:usage, %Nous.Usage{}}stream event — emitted byNous.StreamNormalizer.OpenAIwhen chunks carry ausagefield (auto-enabled by injectingstream_options.include_usage: trueon the OpenAI-compatible streaming request), byNous.StreamNormalizer.Anthropicfrommessage_startandmessage_deltachunks, and byNous.StreamNormalizer.GeminifromusageMetadata. TheNous.Types.stream_eventtypespec is updated.- Mid-stream cancellation —
ctx.cancellation_checkis invoked between every streamed chunk; a thrown{:cancelled, reason}halts the run withErrors.ExecutionCancelledand discards partial state. No tool execution happens on cancellation. Nous.Messages.OpenAI.decode_arguments/1andparse_usage/1promoted to public helpers (formerly private) so the streaming path and theToolCallAccumulatorreuse the same JSON-decode-with-fallback and usage-parsing logic as the non-streaming path. Anthropic and Gemini'sparse_usage/1are similarly public for the same reason.
Changed
- Pre-existing
Nous.Agent.run_stream/3semantics are unchanged. The[thinking] …prefix on:on_llm_new_deltais preserved for that legacy path so existing consumers don't break. lib/nous/provider.exbuild_request_paramsallowlist now includesstream_options(no-op for non-OpenAI providers — silently ignored).
Documentation
- New "Streaming with Tool Execution" section in
README.md. - New "Streaming with Tool Execution (Recommended)" section in
docs/guides/liveview-integration.mdwith a complete LiveView example wiring:agent_delta,:agent_thinking,:tool_call,:tool_result,:agent_message, and:agent_complete. - New "Streaming Structured Output" section in
docs/guides/structured_output.md. - 0.15.2 → 0.15.3 entry in
docs/guides/migration_guide.md. AGENTS.mdQuick Start example updated.
[0.15.2] - 2026-04-27
Documentation-only release. No code changes.
Added
AGENTS.md— quick-reference for AI coding agents (Claude, Cursor, Copilot, Codex, etc.) consuming the library. Covers the minimal API, provider quick-pick, key opts, custom tools, HTTP backend, security rules, common workflows, and what's public vs internal. Conforms to https://agents.md.
Changed
- README "Supported Providers" table now lists
vllm:andsglang:as first-class named providers (previously onlylmstudio:was mentioned; vLLM and SGLang were buried in thecustom:section). - README "Local Servers" section now recommends the dedicated
lmstudio:/vllm:/sglang:/ollama:prefixes overcustom:— they default to the right port, validate*_BASE_URLenv vars throughUrlGuard, and pick up the OpenAI stream normalizer for free. - New "HTTP Backend" section in README covering the pluggable
Nous.HTTP.Backendbehaviour, env-var selection, and shared hackney pool config. - Cleaned up
mix docswarnings — replaced backticks around hidden module references in CHANGELOG so ExDoc no longer tries to auto-link them.
[0.15.1] - 2026-04-26
Follow-up to 0.15.0. No behavioral changes for existing users — the default HTTP backend stays Req. Two themes: making the HTTP backend pluggable, and bringing the local-server providers (LM Studio, vLLM, SGLang) up to date with the post-0.15.0 hackney streaming rewrite.
Added
Pluggable HTTP backend for non-streaming requests. New
Nous.HTTP.Backendbehaviour withNous.HTTP.Backend.Req(default) andNous.HTTP.Backend.Hackneyimplementations. Configure via:- per-call:
HTTP.post(url, body, headers, backend: Nous.HTTP.Backend.Hackney) - env var:
NOUS_HTTP_BACKEND=hackney(also acceptsreqor any fully-qualified custom backend module name) - app config:
config :nous, :http_backend, Nous.HTTP.Backend.Hackney
Precedence: per-call > env > app config > default. Custom backends are resolved via
String.to_existing_atom/1with rescue (per the project-wide C-2 rule from the 0.15.0 review — neverString.to_atom/1on env input). Benchmark script atbench/http_backend.exs; results indocs/benchmarks/http_backend.md.- per-call:
Hackney
:defaultpool is now configurable from app config:config :nous, :hackney_pool, max_connections: 200, timeout: 1_500. Applied at app boot. Used by both the Hackney HTTP backend and the streaming pipeline. (Hackney 4 caps the idle keepalive timeout at 2_000 ms — values above that silently cap.)Per-call
:connect_timeoutand:poolopts added to both HTTP backends andNous.Providers.HTTP.stream/4. Default 30_000ms /:defaultpool. Lets a single app run different timeouts per provider without mutating shared state.Test coverage for
lmstudio:,vllm:,sglang:providers (12 new tests) plus 14 backend contract tests run twice (once per backend) and 9 backend-resolution tests.
Fixed
- Removed dead
finch_namearg fromlmstudio.ex/vllm.ex/sglang.exchat_stream/2calls — leftover from the pre-hackney streaming code;HTTP.stream/4has been ignoring it since 0.15.0. lmstudio:/vllm:/sglang:base_urlis now validated throughNous.Tools.UrlGuardwithallow_private_hosts: true. Rejects malformed schemes (file://,gopher://, etc.) from*_BASE_URLenv vars while keeping localhost defaults.
[0.15.0] - 2026-04-26
Comprehensive security & correctness pass driven by a multi-agent code review of every subsystem. 57 fixes across 10 Critical, 19 High, 16 Medium, and 12 Low severity findings, plus a streaming pipeline rewrite. The full review report is at docs/reviews/2026-04-26-comprehensive-review.md.
Minor version bump (not patch) because of the 9 behavioral changes called out below — most are security defaults moving from open to deny, which existing callers may need to opt back into.
⚠ Behavioral / breaking changes
Read these before upgrading.
- Sub-agent deps no longer auto-forward to children. The
compute_sub_deps/1helper inNous.Plugins.SubAgentnow defaults to[]. The previous default forwarded every parent dep (minus a 6-key denylist) — secrets, repo handles, signed URLs all leaked into LLM-controlled sub-agent contexts. To restore the old behaviour, set:sub_agent_shared_deps, :allexplicitly. Recommended: list specific keys with:sub_agent_shared_deps, [:key1, :key2]. - Tools with
requires_approval: trueare now rejected when no:approval_handleris wired (was silently approved). If you useNous.Tools.Bash,FileWrite, orFileEdit, configure anapproval_handleronRunContextor those tools will refuse to run. - File tools (
FileRead/Write/Edit/Glob/Grep) now enforce a workspace root. Defaults tocwd; override per-agent viadeps: %{workspace_root: "/path"}. Paths that escape the root (absolute paths outside,..traversal, symlink-escape) are rejected with a clear error to the LLM. PromptTemplate.from_template/2rejects template bodies containing<% ... %>blocks other than the simple<%= @ident %>substitution form. Previously bodies were passed throughEEx.eval_string/2, which executes arbitrary Elixir — an RCE vector for any caller piping LLM output into a template. Conditionals must now be expressed by composing multiple smaller templates.- Workflow
:fallbackerror strategy now actually executes the fallback node (was a silent no-op that returned{:fallback, id}as if the primary had succeeded). Workflows that relied on the broken behaviour will now see real fallback execution. - Workflow
max_iterationsexhaustion returns{:error, {:max_iterations_exceeded, node_id, max}}instead of silently{:ok, state}. Quality-gate loops that saturate now surface as failures rather than passing-looking results. - Workflow
:pre_nodehook returning:denyaborts the workflow with{:error, {:hook_denied, hook_name, node_id}}. Previously was silently mapped to{:pause, _}so safety hooks suspended a checkpoint forever. - Permissions
:strictmode is deny-by-default at the filter layer. New:allow_names/:allow_prefixesopts onNous.Permissions.build_policy/1. Previouslystrict_policy()with empty deny lists silently exposed every tool. PromExplugin event names corrected ([:nous, :model, ...]→[:nous, :provider, ...]). Anyone usingNous.PromEx.Pluginsaw zero data on the model/stream metric panels until now. Metric paths still emit asnous_model_*for dashboard backward compatibility.Nous.Tool.Validatornow actually runs.tool.validate_argsdefaulted totruefor months butToolExecutornever called the validator. Tools whose params declared"required": [...]will now reject calls with missing fields up-front (returning a structuredToolErrorto the LLM with the field name) instead of crashing inside the tool body and reporting a genericFunctionClauseError. If you have tools that relied on the lack of validation, setvalidate_args: falseon the tool struct.Nous.Teams.RateLimiter.acquire/3returns{:ok, reservation_ref}instead of:ok. Existing call sites doingassert :ok = RateLimiter.acquire(...)needassert {:ok, _ref} = .... This is the contract change that makes concurrent acquires near the cap race-safe (M-9). Pair withrecord_usage(reservation: ref, ...)for atomic reconciliation, orrelease/2to cancel. Barerecord_usage/3(no:reservation) still works for legacy post-hoc callers.
Added
Nous.Tools.PathGuard— workspace-root sandbox for file tools. Rejects path traversal, NUL-byte injection, and symlink escapes. Used by all five built-in file tools.Nous.Tools.UrlGuard— SSRF protection for outbound HTTP. Rejects schemes other thanhttp/https, blocks RFC1918 / loopback / link-local / CGNAT / IPv6 ULA / cloud-metadata IPs (169.254.169.254). Used byWebFetch(with redirect re-validation) and the Custom provider'sbase_url.:allow_private_hostsopt-in for local dev.- Streaming pipeline rewritten on
:hackney 4:async, :once(pull-based), replacing the prior spawn +Finch.stream+ mailbox plumbing. TheStream.resourceconsumer now drives:hackney.stream_next/1directly — backpressure is structural, no consumer mailbox can grow unboundedly. Same path picks up hackney 4's HTTP/3 + Alt-Svc auto-upgrade for free. New:bypass-driven integration tests exercise the streaming path end-to-end. link_counts_by_source/1optional Store callback for KB backends. ETS implementation provided. Reduceskb_health_checkfrom O(E·L) to O(L) — health checks on a 1k-entry / 5k-link KB drop from millions of comparisons to thousands.- Workflow fallback validation in
Nous.Workflow.Compiler— fallback target nodes are reachable for the purposes of:unreachable_nodesvalidation but excluded from the topo order so they don't double-execute. - AgentServer task generation refs — every spawned agent task carries a monotonic ref; stale
:agent_response_ready/:agent_task_completedmessages from cancelled tasks are discarded. Fixes silent message loss when the user types fast or callsclear_historymid-stream. - Seven new test files:
test/nous/json_test.exs,test/nous/prompt_template_test.exs,test/nous/tools/path_guard_test.exs,test/nous/tools/url_guard_test.exs, plus expanded coverage intest/nous/workflow/phase2_test.exs,test/nous/workflow/phase3_test.exs,test/nous/transcript_test.exs. Test suite: 1539 → 1543 passing (mix test), plus 0 dialyzer errors and 0 credo issues at--strict.
Fixed (security)
- Atom-table DoS via
String.to_atom/1on untrusted input across 7 modules (Critical). Adopted a project-wide rule — neverString.to_atom/1on data that didn't originate from a literal in this repo. Audited and fixed:Agent.Context.safe_to_atom, skill loader frontmatter parser, LlamaCpp provider message-key conversion,PromptTemplate.extract_variables,Eval.TestCaseYAML key conversion, and the--tags/--excludeparsers inmix nous.eval/mix nous.optimize. - EEx code-execution from template bodies (Critical, see breaking changes above) —
PromptTemplatenow rejects non-<%= @var %>markers. Nous.Hook:commandtype now requires a[program | args]list, not a raw string. Previous string handler was passed toNetRunner.run(["sh", "-c", str], ...)— RCE class ifhandlerever came from config or user input.BashandFileGreptools scrub the env before shelling out — whitelistsPATH/HOME/LANG/LC_ALL/TZ/USER/SHELL/TERM, drops*_API_KEY,*_TOKEN,*_SECRET,LD_PRELOAD, etc.FileGrepnow resolvesrgviaSystem.find_executable/1(nowhichPATH-shadowing).Bashuses absolute/bin/sh.HumanInTheLoopplugin matches tool names case-insensitively — was raw equality; a tool registered as"Send_Email"bypassed approval if config said"send_email".Nous.Plugins.Memorywraps auto-injected memories in<retrieved_memory>tags with provenance metadata and an explicit "USER-SUPPLIED DATA, not instructions" framing — defense-in-depth against stored prompt injection through the LLM-callableremembertool.extra_bodyblocked-keys list — dropsmessages,model,stream,system,tools,tool_choicewith a logged warning. Preventsextra_bodyfrom being a back-door for rewriting the conversation, model, or safe-tool whitelist.BraveSearchmigrated from raw:httpc(no TLS verify by default) toReqwith explicitverify: :verify_peer. Previous code path leaked the API key to any MITM on the wire.Customprovider validatesbase_urlthroughUrlGuardat startup — SSRF prevention for the user-supplied endpoint URL.- Skill loader caps file count (1000) and individual file size (5MB), and skips symlinks — prevents loading
/etc/passwdvia a symlink in a skills directory.
Fixed (correctness)
- Streaming normalizers (OpenAI / LlamaCpp) no longer drop
tool_callsorfinish_reasonwhen both arrive in the same chunk. Previously thecondreturned a single event and silently dropped the others; tool-calling agents misclassified termination and the OpenAI complete-response path lost tool calls entirely. Anthropic streaming
input_json_deltafragments are now tagged with content-block_indexand_phase(:start | :partial | :stop) so a stateful consumer can reassemble the full tool call. The non-streamingconvert_complete_response/1path was already correct.- Transcript compaction preserves
tool_call/tool_resultpairs across the compaction boundary. Previously the naiveEnum.splitcould orphan a:toolmessage from its assistant prelude — Anthropic and OpenAI 400 in that shape. - AgentServer task generation refs (C-5/H-16/L-7) prevent silent message loss in three races: stale
:agent_response_readyoverwriting a cancelled context,clear_historyun-clearing itself, and the wildcard:DOWNhandler clearing the wrong task. - Workflow scratch ETS leak —
maybe_cleanup_scratch/1now runs on every non-suspended terminal path (was only the:okarm). Failed workflows under retry no longer accumulate orphan ETS tables. - Memory backends (Hybrid/Muninn/Zvec) use unnamed ETS tables — named tables are global per BEAM, so a second concurrent agent crashed
init/1with "table already exists". - Memory backends roll back on NIF errors —
:ok = NIF.call(...)pattern-matches replaced withwithchains; ETS insert/delete only happens after the index op succeeds, leaving consistent (entry-absent) state on failure. - SQLite memory store wraps multi-statement ops in
BEGIN ... COMMIT— a crash mid-write would have left a row inmemorieswithout itsmemories_ftsrow, silently invisible torecallbut visible tolist. - SQLite/DuckDB metadata
atomize_keyssurvives unknown keys — was raisingArgumentErroron a single new key in user-supplied metadata, breakingrecall/listfor the entire process. parallel_maphandler{:error, _}returns are collected as failures —safely_run_handler/3previously wrapped any return value in:ok, so user error returns silently landed insuccessful_results.AgentRunnerno longer mutatesagent.modelmid-run when fallback fires. Active model is tracked onctx.deps[:active_model]and surfaced in stop telemetry as:active_model_provider/:active_model_name/:fallback_used. Sticky-fallback is preserved across iterations. New[:nous, :agent, :fallback, :used]event when the chain advances.Persistence.ETStable is owned by a dedicatedTableOwnerGenServer under the application supervisor — was dying with whichever transient process happened to callsave/loadfirst.save/2now returns{:error, _}on insert failure (was unconditional:ok).Decisions.supersede/5docstring corrected — flagged as best-effort, not atomic. The Store behaviour has no transaction primitive yet.- Coordinator
Process.demonitor/2on agent removal — was leaking monitor refs and could fire spurious{:agent_crashed, name, _}for healthy agents after rapid stop+respawn. - Workflow
:workflow_endhook payload now reflects failure-time state, not initial state, so post-mortems see the actual state at failure. - AgentServer
load_contextruns in aTask.Supervisor.start_childtask withGenServer.reply/2— slow persistence backends no longer block concurrentget_context/cancel_executioncalls. - AgentDynamicSupervisor + Application supervisor restart limits tuned to
max_restarts: 100, max_seconds: 10(was the default 3-in-5) so one bad user's crash loop doesn't take down every other tenant. Nous.Teams.RateLimiteris now race-safe under concurrent acquires (M-9 final).acquire/3now returns{:ok, reservation_ref} | {:error, _}and atomically reserves the estimated tokens + 1 request slot.record_usage/3accepts:reservationto reconcile actual vs estimated; missing reconciliations are auto-refunded after:reservation_ttl_ms(default 5 min) with aLogger.warning/1.release/2cancels a reservation when the call errored before completing. Legacyrecord_usage/3without:reservationstill works for callers that don't go throughacquire. Added:open_reservationstoget_status/1.Nous.Memory.Embedding.Bumblebeeuses a Registry + DynamicSupervisor (M-7 final). Each model_name is owned by exactly oneServingHolderGenServer registered by name. Replaces the:persistent_termcache (which forced a node-wide GC pause per new model). The application supervisor conditionally adds the Registry + ServingSupervisor children when Bumblebee is loaded.
Fixed (UX / minor)
clean_tool_name/1toleratesniland non-binary input (some providers emit malformed function-call responses).- OpenAI
reasoning_model?/1matches the fullo[1-9]family via regex (catches newo4,o3-pro, etc.); also stripspresence_penaltyandfrequency_penaltyfor reasoning models. Tool.from_function/2no longer fakes a hardcodedqueryparameter schema when no@docis found — falls back to the empty additional-properties schema with a debug log.- KB
Entry.slugify/1NFD-normalises and strips combining marks so"Café"→"cafe"instead of being entirely stripped. kb_health_checkcoherence_scoreweighted by issue severity (:high 0.2, :medium 0.1, :low 0.05), clamped to[0.0, 1.0].- ParallelExecutor sorts branch results by
branch_idbefore merging — deterministic instead of completion-order-dependent. - Transcript
summarize/1redacts:toolmessage content (replaced with a structural marker) so secrets / PII pulled from MCP don't bake into the permanent summary. - All compile warnings cleared (unused aliases, unused vars, dialyzer "clause never matches" on test stubs, "incompatible types" on intentional
assert_raiseconstructions).
Known limitations (documented in code, not silently glossed)
- 9 modules carry
@dialyzer :no_opaqueforMapSetcapture-syntax false positives — Elixir community standard, each suppression has a one-line justification at the top of its module. Specs were tried first and verified not to help; this isn't a code bug, it's a known dialyzer/Elixir interaction with opaque types and capture syntax (&MapSet.member?(set, &1)insideEnum.*).
Dependencies
- Added
{:hackney, "~> 4.0"}(production) for pull-based streaming, replacingFinch.stream/5for the streaming path.Finch/Reqare still used for non-streaming requests. - Added
{:bypass, "~> 2.1", only: :test}for in-test HTTP server fixtures driving the new streaming integration tests.
[0.14.3] - 2026-04-25
Added
:extra_bodysetting for arbitrary request body params — pass vendor-specific top-level JSON keys (e.g.top_k,chat_template_kwargs,repetition_penalty,min_p,best_of,ignore_eos) to OpenAI-compatible providers (vllm:,sglang:,custom:,lmstudio:,ollama:). Mirrors the OpenAI Python SDK'sextra_body=argument. Works indefault_settings,Nous.LLMcalls, and agentmodel_settings. Atom keys are stringified at request build time; nested values pass through verbatim.extra_bodywins on collision with whitelisted keys (escape-hatch semantics). Also forwarded by Gemini and Vertex AI overrides.Example — disable Qwen3 thinking and tune sampling on a vLLM endpoint:
Nous.new("custom:qwen3-vl", base_url: "http://localhost:8000/v1", default_settings: %{ extra_body: %{ top_k: 20, chat_template_kwargs: %{enable_thinking: false} } })Example — interleaved thinking (preserve thinking blocks across turns):
Nous.new("custom:qwen3-vl", base_url: "http://localhost:8000/v1", default_settings: %{ extra_body: %{ chat_template_kwargs: %{preserve_thinking: true} } })
[0.14.2] - 2026-04-13
Fixed
- SubAgent deps propagation — parent deps now flow to sub-agents by default (excluding plugin-internal keys like templates, PubSub, concurrency config). Use
sub_agent_shared_deps: [:key1, :key2]in deps to restrict which keys are shared.
[0.14.0] - 2026-04-11
Added
Nous.KnowledgeBase— LLM-compiled personal knowledge base system inspired by Karpathy's vision. Raw documents are ingested and compiled by an LLM into a structured markdown wiki with summaries, backlinks, cross-references, and semantic search.Core data types:
Nous.KnowledgeBase.Document— raw ingested source material (markdown, text, URL, PDF, HTML) with status tracking and checksumsNous.KnowledgeBase.Entry— compiled wiki entries with titles, slugs,[[wiki-links]], summaries, concepts, tags, confidence scores, and optional embeddingsNous.KnowledgeBase.Link— typed directional links between entries (related, subtopic, prerequisite, contradicts, extends, references)Nous.KnowledgeBase.HealthReport— audit results with statistics, coverage/freshness/coherence scores, and categorized issues
Storage:
Nous.KnowledgeBase.Store— behaviour with 15 callbacks for document, entry, and link CRUD plus search and graph traversalNous.KnowledgeBase.Store.ETS— zero-dependency in-memory backend with Jaro-distance text search and optional embedding vector search
9 agent tools via
Nous.KnowledgeBase.Tools:kb_search,kb_read,kb_list,kb_ingest,kb_add_entry,kb_link,kb_backlinks,kb_health_check,kb_generateNous.Plugins.KnowledgeBase— plugin that auto-injects KB tools and system prompt guidance. Composes withNous.Plugins.Memory. Configurable viadeps[:kb_config]with optional embedding support for semantic search.Nous.Agents.KnowledgeBaseAgent— specialized agent behaviour for KB curation. Adds 4 reasoning tools on top of standard KB tools:kb_plan_compilation,kb_verify_entry,kb_suggest_links,kb_summarize_topic. Tracks KB operations for reporting.Nous.KnowledgeBase.Workflows— pre-built DAG pipelines using the workflow engine:- Ingest pipeline: raw documents → concept extraction → entry compilation → link generation → embedding → persistence
- Incremental update: detect changes via checksums and recompile affected entries
- Health check: audit for stale, orphan, inconsistent, and duplicate entries
- Output generation: produce reports, summaries, or slides from KB content
Nous.KnowledgeBase.Prompts— LLM prompt templates for extraction, compilation, linking, auditing, and output generation1,159 lines of test coverage across 6 test files (document, entry, link, ETS store, tools, plugin)
[0.13.1] - 2026-04-03
Added
Nous.Transcript— Lightweight conversation compaction without LLM calls.compact/2— keep last N messages, summarize older ones into a system messagemaybe_compact/2— auto-compact based on message count (:every), token budget (:token_budget), or percentage threshold (:threshold)compact_async/2andcompact_async/3— background compaction viaNous.TaskSupervisormaybe_compact_async/3— background auto-compact with{:compacted, msgs}/{:unchanged, msgs}callbacksestimate_tokens/1andestimate_messages_tokens/1— word-count-based token estimation
Built-in Coding Tools — 6 tools implementing
Nous.Tool.Behaviourfor coding agents:Nous.Tools.Bash— shell execution via NetRunner with timeout and output limitsNous.Tools.FileRead— file reading with line numbers, offset, and limitNous.Tools.FileWrite— file writing with auto parent directory creationNous.Tools.FileEdit— string replacement with uniqueness check andreplace_allNous.Tools.FileGlob— file pattern matching sorted by modification timeNous.Tools.FileGrep— content search with ripgrep fallback to pure Elixir regex
Nous.Permissions— Tool-level permission policy engine complementing InputGuard:- Three presets:
default_policy/0,permissive_policy/0,strict_policy/0 build_policy/1— custom policies with:deny,:deny_prefixes,:approval_requiredblocked?/2,requires_approval?/2— case-insensitive tool name checkingfilter_tools/2,partition_tools/2— filter tool lists through policies
- Three presets:
Nous.Session.ConfigandNous.Session.Guardrails— session-level turn limits and token budgets:Configstruct withmax_turns,max_budget_tokens,compact_after_turnsGuardrails.check_limits/4— returns:okor{:error, :max_turns_reached | :max_budget_reached}Guardrails.remaining/4,Guardrails.summary/4— budget tracking and reporting
Fixed
- Empty stream silent failure:
run_streamnow emits{:error, :empty_stream}+ warning when a provider returns zero events (e.g. minimax), instead of silently yielding{:complete, %{output: ""}}. Memory.Searchcrash on vector search error:{:ok, results} = store_mod.search_vector(...)pattern match replaced withcase— logs warning and returns empty list on error.- Atom table exhaustion in skill loader:
String.to_atom/1replaced withString.to_existing_atom/1+ rescue fallback with debug logging. - Context deserialization crash on unknown roles:
String.to_existing_atom/1replaced with explicit role whitelist (:system,:user,:assistant,:tool), defaults to:userwith warning. - Unbounded inspect in stream normalizer:
inspect(chunk, limit: :infinity)capped tolimit: 500, printable_limit: 1000. - SQLite embedding decode crash:
JSON.decode!/1wrapped in rescue, returnsnilwith warning on malformed data. - Muninn bare rescue:
rescue _ ->replaced with specific exception types (MatchError,File.Error,ErlangError,RuntimeError).
Documentation
- Memory System Guide (
docs/guides/memory.md) — 630+ line walkthrough covering all 6 store backends, search/scoring, BM25, agent integration, and cross-agent memory sharing. - Context & Dependencies Guide (
docs/guides/context.md) — RunContext, ContextUpdate operations, stateful agent walkthrough, multi-user patterns. - Skills Guide enhanced — added 400+ lines: module-based and file-based skill walkthroughs, skill groups, activation modes, plugin configuration.
- LiveView examples — chat interface (
liveview_chat.exs) and multi-agent dashboard (liveview_multi_agent.exs) reference implementations. - PostgreSQL memory example (
postgresql_full.exs) — end-to-end Store implementation with tsvector + pgvector, BM25 search, hybrid RRF search. - Coding agent example (
19_coding_agent.exs) — permissions, tools, guardrails, and transcript compaction. - Tool permissions example (
tool_permissions.exs) — policy presets, custom deny lists, tool filtering.
[0.13.0] - 2026-03-28
Added
Nous.Workflow— DAG/graph-based workflow engine for orchestrating agents, tools, and control flow as executable directed graphs. Complements Decisions (reasoning tracking) and Teams (persistent agent groups).- Builder API:
Ecto.Multi-style pipes —Workflow.new/1 |> add_node/4 |> connect/3 |> chain/2 |> run/2 - 8 node types:
:agent_step,:tool_step,:transform,:branch,:parallel,:parallel_map,:human_checkpoint,:subworkflow - Hand-rolled graph: dual adjacency maps, Kahn's algorithm for topological sort + cycle detection + parallel execution levels in one O(V+E) pass
- Static parallel: named branches fan-out concurrently via
Task.Supervisor - Dynamic
parallel_map: runtime fan-out over data lists withmax_concurrencythrottling — the scatter-gather pattern - Cycle support: edge-following execution with per-node max-iteration guards for retry/quality-gate loops
- Workflow hooks:
:pre_node,:post_node,:workflow_start,:workflow_end— integrates with existingNous.Hookstruct - Pause/resume: via hook (
{:pause, reason}),:atomicsexternal signal, or:human_checkpointauto-suspend - Error strategies:
:fail_fast,:skip,{:retry, max, delay},{:fallback, node_id}per node - Telemetry:
[:nous, :workflow, :run|:node, :start|:stop|:exception]events - Execution tracing: opt-in per-node timing and status recording (
trace: true) - Checkpointing:
Checkpointstruct +Storebehaviour + ETS backend - Subworkflows: nested workflow invocation with
input_mapper/output_mapperfor data isolation - Runtime graph mutation:
on_node_completecallback,Graph.insert_after/6,Graph.remove_node/2 - Mermaid visualization:
Workflow.to_mermaid/1generates flowchart diagrams with type-specific node shapes - Scratch ETS: optional per-workflow ETS table for large/binary data exchange between steps
- 113 new tests covering all workflow features
- Builder API:
[0.12.17] - 2026-03-28
Removed
- Dead module
Nous.Decisions.Tools: 4 tool functions never used by any plugin or code path. - Dead module
Nous.StreamNormalizer.Mistral: Mistral provider uses the default OpenAI-compatible normalizer. - Dead function
emit_fallback_exhausted/3in Fallback module: Defined but never called. - Dead config
enable_telemetry: Set in config files but never read — telemetry is always on. - Dead config
log_level: Set in dev/test configs but never read by Nous. - Unused test fixtures:
NousTest.Fixtures.LLMResponsesand its generator script (generated Oct 2025, never imported).
Fixed
- Compiler warning in
output_schema.ex: Removed always-truthy conditional aroundto_json_schema/1return value.
Changed
- All JSON encoding/decoding uses built-in
JSONmodule instead ofJason. Jason removed from direct dependencies. - Added
pretty_encode!/1helper to internal JSON module for pretty-printed JSON output (used in LLM prompts and eval reports). - Updated README with Elixir 1.18+ / OTP 27+ requirements.
[0.12.16] - 2026-03-28
Fixed
- Anthropic multimodal messages silently lost image data:
message_to_anthropic/1matched oncontentbeing a list, butMessage.user/2stores content parts inmetadata.content_partsas a string. Multimodal messages were sent as plain text, losing all image data. Now reads from metadata like the OpenAI formatter. - Gemini multimodal messages had the same issue: Same pattern match bug caused all image content to be dropped.
- Anthropic image format incorrect: The
datafield contained the full data URL prefix (data:image/jpeg;base64,...) instead of raw base64;media_typewas hardcoded to"image/jpeg"regardless of actual format; HTTP URLs were incorrectly wrapped as base64 source instead of"type": "url". - Gemini had no image support: All non-text content parts fell through to a
[Image: ...]text representation. Now usesinlineDatafor base64 images andfileDatafor HTTP URLs. - Anthropic duplicate thinking block: Assistant messages with reasoning content emitted the
thinkingblock twice.
Added
ContentPart.parse_data_url/1— extract MIME type and raw base64 data from a data URL string.ContentPart.data_url?/1andContentPart.http_url?/1— URL type predicates.- OpenAI formatter:
:imagecontent type support (converts to data URL) anddetailoption passthrough forimage_urlparts. - Comprehensive vision test pipeline (
test/nous/vision_pipeline_test.exs) with 19 unit tests covering format conversion across all providers and 4 LLM integration tests. - Test fixture images:
test_square.png(100x100 red),test_tiny.webp(minimal WebP).
[0.12.15] - 2026-03-26
Fixed
receive_timeoutsilently dropped inNous.LLM:generate_text/3andstream_text/3with a string model only passed[:base_url, :api_key, :llamacpp_model]toModel.parse, soreceive_timeoutwas silently ignored. Now correctly forwarded.
Removed
- Dead timeout config: Removed unused
default_timeoutandstream_timeoutfromconfig/config.exs. Timeouts are determined by per-provider defaults inModel.default_receive_timeout/1and each provider module's@default_timeout/@streaming_timeoutconstants.
Documentation
- Added "Timeouts" section to README documenting
receive_timeoutoption and default timeouts per provider.
[0.13.0] - 2026-03-21
Added
Hooks system: Granular lifecycle interceptors for tool execution and request/response flow.
- 6 lifecycle events:
pre_tool_use,post_tool_use,pre_request,post_response,session_start,session_end - 3 handler types:
:function(inline),:module(behaviour),:command(shell via NetRunner) - Matcher-based dispatch: string (exact tool name), regex, or predicate function
- Blocking semantics for
pre_tool_useandpre_request— hooks can deny or modify tool calls - Priority-based execution ordering (lower = earlier)
Telemetry events:
[:nous, :hook, :execute, :start | :stop],[:nous, :hook, :denied]Nous.Hook,Nous.Hook.Registry,Nous.Hook.Runner- New option on
Nous.Agent.new/2::hooks - New example:
examples/16_hooks.exs
- 6 lifecycle events:
Skills system: Reusable instruction/capability packages for agents.
- Module-based skills with
use Nous.Skillmacro and behaviour callbacks - File-based skills: markdown files with YAML frontmatter, loaded from directories
- 5 activation modes:
:manual,:auto,{:on_match, fn},{:on_tag, tags},{:on_glob, patterns} - Skill groups:
:coding,:review,:testing,:debug,:git,:docs,:planning - Registry with load/unload, activate/deactivate, group operations, and input matching
Nous.Plugins.Skills— auto-included plugin bridging skills into the agent lifecycle- Directory scanning:
skill_dirs:option andNous.Skill.Registry.register_directory/2 Telemetry events:
[:nous, :skill, :activate | :deactivate | :load | :match]- New options on
Nous.Agent.new/2::skills,:skill_dirs - New example:
examples/17_skills.exs - New guides:
docs/guides/skills.md,docs/guides/hooks.md
- Module-based skills with
21 built-in skills:
- Language-agnostic (10): CodeReview, TestGen, Debug, Refactor, ExplainCode, CommitMessage, DocGen, SecurityScan, Architect, TaskBreakdown
- Elixir-specific (5): PhoenixLiveView, EctoPatterns, OtpPatterns, ElixirTesting, ElixirIdioms
- Python-specific (6): PythonFastAPI, PythonTesting, PythonTyping, PythonDataScience, PythonSecurity, PythonUv
NetRunner dependency (
~> 1.0.4): Zero-zombie-process OS command execution for command hooks with SIGTERM→SIGKILL timeout escalation.76 new tests for hooks and skills systems.
[0.12.11] - 2026-03-19
Added
- Per-run structured output override: Pass
output_type:andstructured_output:as options toNous.Agent.run/3andNous.Agent.run_stream/3to override the agent's defaults per call. The same agent can return raw text or structured data depending on the request. - Multi-schema selection (
{:one_of, [SchemaA, SchemaB]}): New output_type variant where the LLM dynamically chooses which schema to use per response. Each schema becomes a synthetic tool — the LLM's tool choice acts as schema selection. Includes automatic retry and validation against the selected schema.OutputSchema.schema_name/1— public helper to get snake_case name for a schema moduleOutputSchema.tool_name_for_schema/1— build synthetic tool name from schema moduleOutputSchema.find_schema_for_tool_name/2— reverse-map tool name to schema moduleOutputSchema.synthetic_tool_name?/1— predicate for synthetic tool call detectionOutputSchema.extract_response_for_one_of/2— extract text and identify matched schema from tool call- New example: Example 6 (per-run override) and Example 7 (multi-schema) in
examples/14_structured_output.exs - New sections in
docs/guides/structured_output.md
Fixed
- Synthetic tool call handling: Structured output tool calls (
__structured_output__) in:tool_callmode are now correctly filtered from the tool execution loop. Previously, these synthetic calls would produce "Tool not found" errors and cause an unnecessary extra LLM round-trip. Now they terminate the loop immediately and the structured output is extracted directly.
[0.12.10] - 2026-03-19
Added
- Fallback model/provider support: Automatic failover to alternative models when the primary model fails with a
ProviderErrororModelError(rate limit, server error, timeout, auth issue).Nous.Fallback— core fallback logic: eligibility checks, recursive model chain traversal, model string/struct parsing:fallbackoption onNous.Agent.new/2— ordered list of fallback model strings orModelstructs:fallbackoption onNous.generate_text/3andNous.stream_text/3- Tool schemas are automatically re-converted when falling back across providers (e.g., OpenAI → Anthropic)
- Structured output settings are re-injected for the target provider on cross-provider fallback
- Agent model is swapped on successful fallback so remaining iterations use the working model
- Streaming fallback retries stream initialization only, not mid-stream failures
- New telemetry events:
[:nous, :fallback, :activated]and[:nous, :fallback, :exhausted] - Only
ProviderErrorandModelErrortrigger fallback; application-level errors (ValidationError,MaxIterationsExceeded,ExecutionCancelled,ToolError) are returned immediately - 52 new tests across
test/nous/fallback_test.exsandtest/nous/agent_fallback_test.exs
Changed
Nous.Agentstruct gainsfallback: [Model.t()]field (default:[])Nous.LLMnow uses injectable dispatcher (get_dispatcher/0) for testability, consistent withAgentRunner
[0.12.9] - 2026-03-12
Added
- InputGuard plugin: Modular malicious input classifier with pluggable strategy pattern. Detects prompt injection, jailbreak attempts, and other malicious inputs before they reach the LLM.
Nous.Plugins.InputGuard— Main plugin with configurable aggregation (:any/:majority/:all), short-circuit mode, and violation callbacksNous.Plugins.InputGuard.Strategy— Behaviour for custom detection strategiesNous.Plugins.InputGuard.Strategies.Pattern— Built-in regex patterns for instruction override, role reassignment, DAN jailbreaks, prompt extraction, and encoding evasion. Supports:extra_patterns(additive) and:patterns(full override)Nous.Plugins.InputGuard.Strategies.LLMJudge— Secondary LLM classification with fail-open/fail-closed modesNous.Plugins.InputGuard.Strategies.Semantic— Embedding cosine similarity against pre-computed attack vectorsNous.Plugins.InputGuard.Policy— Severity-to-action resolution (:block,:warn,:log,:callback, customfun/2)- Tracks checked message index to prevent re-triggering on tool-call loop iterations
- New example:
examples/15_input_guard.exs
Fixed
- AgentRunner:
before_requestplugin hook now short-circuits the LLM call when a plugin setsneeds_response: false(e.g., InputGuard blocking). Previously the current iteration would still call the LLM before the block took effect on the next iteration.
[0.12.8] - 2026-03-12
Fixed
- Vertex AI v1/v1beta1 bug:
Model.parse("vertex_ai:gemini-2.5-pro-preview-06-05")withGOOGLE_CLOUD_PROJECTset was storing a hardcodedv1URL inmodel.base_url, causing the provider'sv1beta1selection logic to be bypassed. Preview models now correctly usev1beta1at request time.
Added
- Vertex AI input validation: Project ID and region from environment variables are now validated with helpful error messages instead of producing opaque DNS/HTTP errors.
GOOGLE_CLOUD_LOCATIONsupport: Added as a fallback forGOOGLE_CLOUD_REGION, consistent with other Google Cloud libraries and tooling.- Multi-region example script:
examples/providers/vertex_ai_multi_region.exs
[0.12.7] - 2026-03-10
Fixed
- Vertex AI model routing: Fixed
build_request_params/3not including the"model"key in the params map, causingchat/2andchat_stream/2to always fall back to"gemini-2.0-flash"regardless of the requested model. - Vertex AI 404 on preview models: Use
v1beta1API version for preview and experimental models (e.g.,gemini-3.1-pro-preview). Thev1endpoint returns 404 for these models.
Added
Nous.Providers.VertexAI.api_version_for_model/1— returns"v1beta1"for preview/experimental models,"v1"for stable models.Nous.Providers.VertexAI.endpoint/3now accepts an optional model name to select the correct API version.- Debug logging for Vertex AI request URLs.
[0.12.6] - 2026-03-07
Added
- Auto-update memory:
Nous.Plugins.Memorycan now automatically reflect on conversations and update memories after each run — no explicit tool calls needed. Enable withauto_update_memory: trueinmemory_config. Configurable reflection model, frequency, and context limits.- New
after_run/3callback inNous.Pluginbehaviour — runs once after the entire agent run completes. Wired into bothAgentRunner.run/3andrun_with_context/3. Nous.Plugin.run_after_run/4helper for executing the hook across all plugins- New config options:
:auto_update_memory,:auto_update_every,:reflection_model,:reflection_max_tokens,:reflection_max_messages,:reflection_max_memories - New example:
examples/memory/auto_update.exs
- New
[0.12.5] - 2026-03-06
Added
- Vertex AI provider:
Nous.Providers.VertexAIfor accessing Gemini models through Google Cloud Vertex AI. Supports enterprise features (VPC-SC, CMEK, regional endpoints, IAM).- Three auth modes: app config Goth (
config :nous, :vertex_ai, goth: MyApp.Goth), per-model Goth (default_settings: %{goth: MyApp.Goth}), or direct access token (api_key/VERTEX_AI_ACCESS_TOKEN) - Bearer token auth via
api_keyoption,VERTEX_AI_ACCESS_TOKENenv var, or Goth integration - Goth integration (
{:goth, "~> 1.4", optional: true}) for automatic service account token management — reuse existing Goth processes from PubSub, etc. - URL auto-construction from
GOOGLE_CLOUD_PROJECTandGOOGLE_CLOUD_REGIONenv vars Nous.Providers.VertexAI.endpoint/2helper to build endpoint URLs- Reuses existing Gemini message format, response parsing, and stream normalization
- Model string:
"vertex_ai:gemini-2.0-flash"
- Three auth modes: app config Goth (
[0.12.2] - 2026-03-04
Fixed
- Gemini streaming: Fixed streaming responses returning 0 events. The Gemini
streamGenerateContentendpoint returns a JSON array (application/json) by default, not Server-Sent Events. Instead of forcing SSE viaalt=ssequery parameter, added a pluggable stream parser toNous.Providers.HTTP.
Added
Nous.Providers.HTTP.JSONArrayParser— stream buffer parser for JSON array responses. Extracts complete JSON objects from a streaming[{...},{...},...]response by tracking{}nesting depth while respecting string literals and escape sequences.:stream_parseroption onHTTP.stream/4— accepts any module implementingparse_buffer/1with the same{events, remaining_buffer}contract as SSE parsing. Defaults to the existing SSE parser. Enables any provider with a non-SSE streaming format to plug in a custom parser.
[0.12.0] - 2026-02-28
Added
Memory System: Persistent memory for agents with hybrid text + vector search, temporal decay, importance weighting, and flexible scoping.
Nous.Memory.Entry— memory entry struct with type (semantic/episodic/procedural), importance, evergreen flag, and scoping fields (agent_id, session_id, user_id, namespace)Nous.Memory.Store— storage behaviour with 8 callbacks (init, store, fetch, delete, update, search_text, search_vector, list)Nous.Memory.Store.ETS— zero-dep in-memory backend with Jaro-distance text searchNous.Memory.Store.SQLite— SQLite + FTS5 backend (requiresexqlite)Nous.Memory.Store.DuckDB— DuckDB + FTS + vector backend (requiresduckdbex)Nous.Memory.Store.Muninn— Tantivy BM25 text search backend (requiresmuninn)Nous.Memory.Store.Zvec— HNSW vector search backend (requireszvec)Nous.Memory.Store.Hybrid— combines Muninn + Zvec for maximum retrieval qualityNous.Memory.Scoring— pure functions for Reciprocal Rank Fusion, temporal decay, composite scoringNous.Memory.Search— hybrid search orchestrator (text + vector → RRF merge → decay → composite score)Nous.Memory.Embedding— embedding provider behaviour with pluggable implementationsNous.Memory.Embedding.Bumblebee— local on-device embeddings via Bumblebee + EXLA (Qwen 0.6B default)Nous.Memory.Embedding.OpenAI— OpenAI text-embedding-3-small providerNous.Memory.Embedding.Local— generic local endpoint (Ollama, vLLM, LMStudio)Nous.Memory.Tools— agent tools:remember,recall,forgetNous.Plugins.Memory— plugin with auto-injection of relevant memories, configurable search scope and injection strategy- 6 example scripts in
examples/memory/(basic ETS, Bumblebee, SQLite, DuckDB, Hybrid, cross-agent) - 62 new tests across 6 test files
Graceful degradation: No embedding provider = keyword-only search. No optional deps =
Store.ETSwith Jaro matching. The core memory system has zero additional dependencies.
[0.11.3] - 2026-02-26
Fixed
- Anthropic and Gemini streaming: Added missing
Nous.StreamNormalizer.AnthropicandNous.StreamNormalizer.Geminimodules. These were referenced inProvider.default_stream_normalizer/0but never created, causing runtime crashes when streaming with Anthropic or Gemini providers.
Added
Nous.StreamNormalizer.Anthropic— normalizes Anthropic SSE events (content_block_delta,message_delta,content_block_startfor tool use, thinking deltas, error events)Nous.StreamNormalizer.Gemini— normalizes Gemini SSE events (candidatesarray with text parts,functionCall,finishReasonmapping)- 42 tests for both new stream normalizers
[0.11.0] - 2026-02-20
Added
Structured Output Mode: Agents return validated, typed data instead of raw strings. Inspired by instructor_ex.
Nous.OutputSchemacore module: JSON schema generation, provider settings dispatch, parsing and validationuse Nous.OutputSchemamacro with@llm_docattribute for schema-level LLM documentationvalidate_changeset/1optional callback for custom Ecto validation rules- Validation retry loop: failed outputs are sent back to the LLM with error details (
max_retriesoption) - System prompt augmentation with schema instructions
Output Type Variants:
- Ecto schema modules — full JSON schema + changeset validation
- Schemaless Ecto types (
%{name: :string, age: :integer}) — lightweight, no module needed - Raw JSON schema maps (string keys) — passed through as-is
{:regex, pattern}— regex-constrained output (vLLM/SGLang){:grammar, ebnf}— EBNF grammar-constrained output (vLLM){:choice, choices}— choice-constrained output (vLLM/SGLang)
Provider Modes: Controls how structured output is enforced per-provider
:auto(default) — picks best mode for the provider:json_schema—response_formatwith strict JSON schema (OpenAI, vLLM, SGLang, Gemini):tool_call— synthetic tool with tool_choice (Anthropic default):json—response_format: json_object(OpenAI-compatible):md_json— prompt-only enforcement with markdown fence + stop token (all providers)
Provider Passthrough:
response_format,guided_json,guided_regex,guided_grammar,guided_choice,json_schema,regex,generationConfignow passed through inbuild_request_paramsNew Files:
lib/nous/output_schema.ex— core modulelib/nous/output_schema/validator.ex— behaviour definitionlib/nous/output_schema/use_macro.ex—use Nous.OutputSchemamacrodocs/guides/structured_output.md— comprehensive guideexamples/14_structured_output.exs— example script with 5 patternstest/nous/output_schema_test.exs— 42 unit teststest/nous/structured_output_integration_test.exs— 16 integration teststest/eval/agents/structured_output_test.exs— 3 LLM integration tests
Changed
Nous.Agentstruct gainsstructured_outputkeyword list field (mode, max_retries)Nous.Types.output_typeexpanded with schemaless, raw JSON schema, and guided mode tuplesNous.AgentRunnerinjects structured output settings, augments system prompt, handles validation retriesNous.Agents.BasicAgent.extract_output/2routes throughOutputSchema.parse_and_validate/2Nous.Agents.ReActAgent.extract_output/2validatesfinal_answeragainst output_type- Provider
build_request_params/3passes through structured output parameters
[0.10.1] - 2026-02-14
Changed
Sub-Agent plugin unified: Merged
ParallelSubAgentintoNous.Plugins.SubAgent- Single plugin now provides both
delegate_task(single) andspawn_agents(parallel) tools system_prompt/2callback injects orchestration guidance including available templates- Templates accept
%Nous.Agent{}structs (recommended) or config maps (legacy) - Parallel execution via
Task.Supervisor.async_stream_nolink - Configurable concurrency (
parallel_max_concurrency, default: 5) and timeout (parallel_timeout, default: 120s) - Graceful partial failure: crashed/timed-out sub-agents don't block others
- Single plugin now provides both
New Example:
examples/13_sub_agents.exs- Template-based sub-agents using
Nous.Agent.new/2structs - Parallel execution with inline model config
- Direct programmatic invocation bypassing the LLM
- Template-based sub-agents using
[0.10.0] - 2026-02-14
Added
Plugin System: Composable agent extensions via
Nous.Pluginbehaviour- Callbacks:
init/2,tools/2,system_prompt/2,before_request/3,after_response/3 - Add
plugins: [MyPlugin]to any agent for cross-cutting concerns - AgentRunner iterates plugins at each stage of the execution loop
- Callbacks:
Human-in-the-Loop (HITL): Approval workflows for sensitive tool calls
requires_approval: trueonNous.Toolstructapproval_handleronNous.Agent.Contextfor approve/edit/reject decisionsNous.Plugins.HumanInTheLoopfor per-tool configuration via deps
Sub-Agent System: Enable agents to delegate tasks to specialized child agents
Nous.Plugins.SubAgentprovidesdelegate_tasktool- Pre-configured agent templates via
deps[:sub_agent_templates] - Isolated context per sub-agent with shared deps support
Conversation Summarization: Automatic context window management
Nous.Plugins.Summarizationmonitors token usage against configurable threshold- LLM-powered summarization with safe split points (never separates tool_call/tool_result pairs)
- Error-resilient: keeps all messages if summarization fails
State Persistence: Save and restore agent conversation state
Nous.Agent.Context.serialize/1anddeserialize/1for JSON-safe round-tripsNous.Persistencebehaviour withsave/load/delete/listcallbacksNous.Persistence.ETSreference implementation- Auto-save hooks on
Nous.AgentServer
Enhanced Supervision: Production lifecycle management for agents
Nous.AgentRegistryfor session-based process lookup via RegistryNous.AgentDynamicSupervisorfor on-demand agent creation/destruction- Configurable inactivity timeout on
AgentServer(default: 5 minutes) - Added to application supervision tree
Dangling Tool Call Recovery: Resilient session resumption
Nous.Agent.Context.patch_dangling_tool_calls/1injects synthetic results for interrupted tool calls- Called automatically when continuing from an existing context
PubSub Abstraction Layer: Unified
Nous.PubSubmodule for all PubSub usageNous.PubSubwraps Phoenix.PubSub with graceful no-op fallback when unavailable- Application-level configuration via
config :nous, pubsub: MyApp.PubSub - Topic builders:
agent_topic/1,research_topic/1,approval_topic/1 Nous.Agent.Contextgainspubsubandpubsub_topicfields (runtime-only, never serialized)Nous.Agent.Callbacks.execute/3now broadcasts via PubSub as a third channel alongside callbacks andnotify_pidAgentServerrefactored to useNous.PubSub— removes ad-hocsetup_pubsub_functions/0andsubscribe_fn/broadcast_fnfrom state- Research Coordinator broadcasts progress via PubSub when
:session_idis provided - SubAgent plugin propagates parent's PubSub context to child agents
Async HITL Approval via PubSub:
Nous.PubSub.Approvalmodulehandler/1builds an approval handler compatible withNous.Plugins.HumanInTheLoop- Broadcasts
{:approval_required, info}and blocks viareceivefor response respond/4sends approval decisions from external processes (e.g., LiveView)- Configurable timeout with
:rejectas default on expiry - Enables async approval workflows without synchronous I/O
Deep Research Agent: Autonomous multi-step research with citations
Nous.Research.run/2public API with HITL checkpoints between iterations- Five-phase loop: plan → search → synthesize → evaluate → report
Nous.Research.Plannerdecomposes queries into searchable sub-questionsNous.Research.Searcherruns parallel search agents per sub-questionNous.Research.Synthesizerfor deduplication, contradiction detection, gap analysisNous.Research.Reportergenerates markdown reports with inline citations- Progress broadcasting via callbacks,
notify_pid, and PubSub
New Research Tools:
Nous.Tools.WebFetch— URL content extraction with Floki HTML parsingNous.Tools.Summarize— LLM-powered text summarization focused on research queriesNous.Tools.SearchScrape— Parallel fetch + summarize for multiple URLsNous.Tools.TavilySearch— Tavily AI search API integrationNous.Tools.ResearchNotes— Structured finding/gap/contradiction tracking via ContextUpdate
New Dependencies:
floki ~> 0.36(optional, for HTML content extraction)phoenix_pubsub ~> 2.1(test-only, for PubSub integration tests)
Changed
Nous.Agentstruct now acceptsplugins: [module()]optionNous.Toolstruct now acceptsrequires_approval: boolean()optionNous.Agent.Contextnow includesapproval_handler,pubsub, andpubsub_topicfieldsNous.AgentServersupports optional:nameregistration,:persistencebackend, and usesNous.PubSub(removed ad-hocsetup_pubsub_functions/0)Nous.AgentServer:pubsuboption now defaults toNous.PubSub.configured_pubsub()instead ofMyApp.PubSubNous.AgentRunneraccepts:pubsuband:pubsub_topicoptions when building context- Application supervision tree includes AgentRegistry and AgentDynamicSupervisor
[0.9.0] - 2026-01-04
Added
Evaluation Framework: Production-grade testing and benchmarking for AI agents
Nous.Evalmodule for defining and running test suitesNous.Eval.Suitefor test suite management with YAML supportNous.Eval.TestCasefor individual test case definitionsNous.Eval.Runnerfor sequential and parallel test executionNous.Eval.Metricsfor collecting latency, token usage, and cost metricsNous.Eval.Reporterfor console and JSON result reporting- A/B testing support with
Nous.Eval.run_ab/2
Six Built-in Evaluators:
:exact_match- Strict string equality matching:fuzzy_match- Jaro-Winkler similarity with configurable thresholds:contains- Substring and regex pattern matching:tool_usage- Tool call verification with argument validation:schema- Ecto schema validation for structured outputs:llm_judge- LLM-based quality assessment with custom rubrics
Optimization Engine: Automated parameter tuning for agents
Nous.Eval.Optimizerwith three strategies: grid search, random search, Bayesian optimization- Support for float, integer, choice, and boolean parameter types
- Early stopping on threshold achievement
- Detailed trial history and best configuration reporting
New Mix Tasks:
mix nous.eval- Run evaluation suites with filtering, parallelism, and multiple output formatsmix nous.optimize- Parameter optimization with configurable strategies and metrics
New Dependency:
yaml_elixir ~> 2.9for YAML test suite parsing
Documentation
- New comprehensive evaluation framework guide (
docs/guides/evaluation.md) - Five new example scripts in
examples/eval/:01_basic_evaluation.exs- Simple test execution02_yaml_suite.exs- Loading and running YAML suites03_optimization.exs- Parameter optimization workflows04_custom_evaluator.exs- Implementing custom evaluators05_ab_testing.exs- A/B testing configurations
[0.8.1] - 2025-12-31
Fixed
- Fixed
Usagestruct not implementing Access behaviour for telemetry metrics - Fixed
Task.shutdown/2nil return case inAgentServercancellation - Fixed tool call field access for OpenAI-compatible APIs (string vs atom keys)
Added
- Vision/multimodal test suite with image fixtures (
test/nous/vision_test.exs) - ContentPart test suite for image conversion utilities (
test/nous/content_part_test.exs) - Multimodal message examples in conversation demo (
examples/04_conversation.exs)
Changed
- Updated docs to link examples to GitHub source files
- Improved sidebar grouping in hexdocs
[0.8.0] - 2025-12-31
Added
Context Management: New
Nous.Agent.Contextstruct for immutable conversation state, message history, and dependency injection. Supports context continuation between runs:{:ok, result1} = Nous.run(agent, "My name is Alice") {:ok, result2} = Nous.run(agent, "What's my name?", context: result1.context)Agent Behaviour: New
Nous.Agent.Behaviourfor implementing custom agents with lifecycle callbacks (init_context/2,build_messages/2,process_response/3,extract_output/2).Dual Callback System: New
Nous.Agent.Callbackssupporting both map-based callbacks and process messages:# Map callbacks Nous.run(agent, "Hello", callbacks: %{ on_llm_new_delta: fn _event, delta -> IO.write(delta) end }) # Process messages (for LiveView) Nous.run(agent, "Hello", notify_pid: self())Module-Based Tools: New
Nous.Tool.Behaviourfor defining tools as modules withmetadata/0andexecute/2callbacks. UseNous.Tool.from_module/2to create tools from modules.Tool Context Updates: New
Nous.Tool.ContextUpdatestruct allowing tools to modify context state:def my_tool(ctx, args) do {:ok, result, ContextUpdate.new() |> ContextUpdate.set(:key, value)} endTool Testing Helpers: New
Nous.Tool.Testingmodule withmock_tool/2,spy_tool/1, andtest_context/1for testing tool interactions.Tool Validation: New
Nous.Tool.Validatorfor JSON Schema validation of tool arguments.Prompt Templates: New
Nous.PromptTemplatefor EEx-based prompt templates with variable substitution.Built-in Agent Implementations:
Nous.Agents.BasicAgent(default) andNous.Agents.ReActAgent(reasoning with planning tools).Structured Errors: New
Nous.Errorsmodule withMaxIterationsReached,ToolExecutionError, andExecutionCancellederror types.Enhanced Telemetry: New events for iterations (
:iteration), tool timeouts (:tool_timeout), and context updates (:context_update).
Changed
Result Structure:
Nous.run/3now returns%{output: _, context: _, usage: _}instead of just output string.Tool Function Signature: Tools now receive
(ctx, args)instead of(args). The context provides access toctx.depsfor dependency injection.Examples Modernized: Reduced from ~95 files to 21 files. Flattened directory structure from 4 levels to 2 levels. All examples updated to v0.8.0 API.
Removed
Removed deprecated provider modules:
Nous.Providers.Gemini,Nous.Providers.Mistral,Nous.Providers.VLLM,Nous.Providers.SGLang.Removed built-in tools:
Nous.Tools.BraveSearch,Nous.Tools.DateTimeTools,Nous.Tools.StringTools,Nous.Tools.TodoTools. These can be implemented as custom tools.Removed
Nous.RunContext(replaced byNous.Agent.Context).Removed
Nous.PromEx.Plugin(users can implement custom Prometheus metrics using telemetry events).
[0.7.2] - 2025-12-29
Fixed
Stream completion events: The
[DONE]SSE event now properly emits a{:finish, "stop"}event instead of being silently discarded. This ensures stream consumers always receive a completion signal.Documentation links: Fixed broken links in hexdocs documentation. Relative links to
.exsexample files now use absolute GitHub URLs so they work correctly on hexdocs.pm.
[0.7.1] - 2025-12-29
Changed
Make all provider dependencies optional:
openai_ex,anthropix, andgemini_exare now truly optional dependencies. Users only need to install the dependencies for the providers they use.Runtime dependency checks: Provider modules now check for dependency availability at runtime instead of compile-time, allowing the library to compile without any provider-specific dependencies.
OpenAI message format: Messages are now returned as plain maps with string keys (
%{"role" => "user", "content" => "Hi"}) instead ofOpenaiEx.ChatMessagestructs. This removes the compile-time dependency onopenai_exfor message formatting.
Fixed
Fixed "anthropix dependency not available" errors that occurred when using the library in applications without
anthropixinstalled.Fixed compile-time errors that occurred when
openai_exwas not present in the consuming application.
[0.7.0] - 2025-12-27
Initial public release with multi-provider LLM support:
- OpenAI-compatible providers (OpenAI, Groq, OpenRouter, Ollama, LM Studio, vLLM)
- Native Anthropic Claude support with extended thinking
- Google Gemini support
- Mistral AI support
- Tool/function calling
- Streaming support
- ReAct agent implementation