CouncilEx.Tools.InMemoryDocs (CouncilEx v0.1.0)

Copy Markdown View Source

Build a CouncilEx.Tool that retrieves passages from an in-memory document corpus via BM25 ranking.

Pure Elixir, zero deps. Designed for examples, tests, and small shared knowledge bases. Production users with real corpora should write their own tool that wraps a vector DB / search service — this module is the reference impl, not the production retrieval path.

Usage

defmodule MyApp.Tools.SearchDocs do
  use CouncilEx.Tools.InMemoryDocs,
    name: "search_docs",
    description: "Search project docs for relevant passages.",
    docs: [
      %{text: "BM25 scores documents by term frequency and inverse document frequency."},
      %{text: "Tools self-describe via name, description, and a parameters_schema."},
      %{text: "Council members can call tools mid-completion to retrieve evidence."}
    ],
    top_k: 4
end

Each entry in :docs is either a string (auto-wrapped to %{text: ...}) or a map with a :text key plus optional :meta (a free-form map). The corpus and BM25 index are computed at compile time and embedded in the generated module — no runtime registration needed.

Attach the resulting tool to any council the same way as any other tool:

DynamicCouncil.add_council_tool(council, MyApp.Tools.SearchDocs)
# or per-member:
DynamicCouncil.add_member(council, %{id: "alice", tools: [MyApp.Tools.SearchDocs], ...})

Tool parameters

  • query (string, required) — the search query.
  • top_k (integer, optional) — number of passages to return. Defaults to the value passed at use time (default 4). Hard-capped at the corpus size.

Tool result

A list of maps %{text: String.t(), meta: map(), score: float()}, ranked by descending score. The provider serializes this back into the tool message the model reads on its next turn.

BM25 details

Standard Robertson/Spärck-Jones BM25 with k1 = 1.2, b = 0.75. Token filter is permissive: lowercase, split on \W+, drop tokens shorter than two characters. A small English stop-word list is excluded from both index and query. These knobs are intentionally fixed — if you need to tune them, write a real retrieval tool.

Summary

Types

A normalized document carried in the index.

Pre-computed BM25 index.

Functions

Build a BM25 index over a list of normalized docs. Public so callers can pre-build an index (e.g., in tests or runtime use cases).

BM25 search. Returns the top k documents ranked by descending score as [%{text, meta, score}]. Documents with score 0 are excluded.

Tokenize text for indexing/queries. Lowercase, split on non-word characters, drop tokens shorter than two chars and English stop-words.

Types

doc()

@type doc() :: %{text: String.t(), meta: map()}

A normalized document carried in the index.

index()

@type index() :: %{
  docs: [doc()],
  tf: [%{required(String.t()) => non_neg_integer()}],
  df: %{required(String.t()) => non_neg_integer()},
  dl: [non_neg_integer()],
  avgdl: float(),
  n: non_neg_integer()
}

Pre-computed BM25 index.

Functions

build_index(docs)

@spec build_index([doc()]) :: index()

Build a BM25 index over a list of normalized docs. Public so callers can pre-build an index (e.g., in tests or runtime use cases).

search(index, query, k)

@spec search(index(), String.t(), pos_integer()) :: [map()]

BM25 search. Returns the top k documents ranked by descending score as [%{text, meta, score}]. Documents with score 0 are excluded.

tokenize(text)

@spec tokenize(String.t()) :: [String.t()]

Tokenize text for indexing/queries. Lowercase, split on non-word characters, drop tokens shorter than two chars and English stop-words.