Postgres is Exograph's built-in backend. Exograph uses Ecto migrations, schemas,
Repo operations, and transactions for source files, fragments, comments,
definitions, references, package metadata, and call graph facts.
ParadeDB's pg_search extension is optional and accelerates BM25 text/code-fact
retrieval when available.
Indexing with Postgres
{:ok, index} =
Exograph.index("lib",
repo: MyApp.Repo,
migrate?: true,
bm25?: true
)backend: :postgres is accepted explicitly, but Postgres is the only built-in backend.
Tables
The canonical migration creates normalized Ecto-backed tables under the configured
prefix (default: exograph):
exograph_packagesexograph_package_versionsexograph_filesexograph_fragmentsexograph_commentsexograph_definitionsexograph_referencesexograph_graph_nodesexograph_call_edgesexograph_tree_nodes
Source text is stored once in exograph_files. Fragments carry package, version,
and file foreign keys.
Indexes
Structural search — (kind, name, arity) btree
A btree index on (kind, name, arity) on the fragments table lets structural
queries that know the kind or name skip the GIN term index entirely and go
straight to a btree range scan. This is the hot path for patterns like
def handle_call(_, _, _) do ... end where kind=def and name=handle_call
are extracted at query planning time.
Term index — GIN
The inverted index on fragment terms (exograph_terms) uses a GIN index. Terms
are extracted by ExAST at indexing time and stored as normalized strings.
Candidate retrieval scans the GIN index; ExAST verification follows.
Text search — pg_trgm GIN
pg_trgm GIN indexes on files.source and files.comments_text enable fast
ILIKE and ~* regex without a full table scan:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX CONCURRENTLY ON exograph_files USING gin (source gin_trgm_ops);
CREATE INDEX CONCURRENTLY ON exograph_files USING gin (comments_text gin_trgm_ops);Migrations create these automatically. Without pg_trgm, text/regex search
falls back to sequential scans.
ParadeDB / pg_search
When ParadeDB's pg_search extension is installed, migrate?: true creates BM25
indexes over source files, comments, definitions, and references. Source files use
ParadeDB's pdb.source_code tokenizer; symbol names use pdb.ngram for
prefix/partial matching.
Raw SQL is limited to areas Ecto cannot express directly:
CREATE EXTENSION- ParadeDB
USING bm25 - ParadeDB tokenizer casts such as
source::pdb.source_code - ParadeDB operators (
|||,&&&) - ParadeDB scoring (
pdb.score(...))
Recommended Postgres settings
Default Postgres settings are conservative. On a 13.8M fragment index, these settings reduce structural search time from ~600ms to ~78ms and enable parallel BM25 scans.
-- Parallel workers for ParadeDB BM25 scans
ALTER SYSTEM SET max_parallel_workers_per_gather = 4;
ALTER SYSTEM SET max_parallel_workers = 16;
ALTER SYSTEM SET max_worker_processes = 24;
-- Memory — adjust to available RAM
ALTER SYSTEM SET shared_buffers = '4GB'; -- ~25% of RAM
ALTER SYSTEM SET effective_cache_size = '12GB'; -- ~75% of RAM
ALTER SYSTEM SET work_mem = '256MB'; -- per-operation sort/hash
ALTER SYSTEM SET maintenance_work_mem = '1GB'; -- for CREATE INDEX / VACUUM
SELECT pg_reload_conf();
-- shared_buffers requires a Postgres restartAfter restarting, prewarm BM25 indexes into the buffer cache:
CREATE EXTENSION IF NOT EXISTS pg_prewarm;
SELECT pg_prewarm('hex_files_bm25_idx');
SELECT pg_prewarm('hex_fragments_bm25_idx');
SELECT pg_prewarm('hex_definitions_bm25_idx');For write-heavy indexing runs, increase parallelism for index creation:
SET max_parallel_maintenance_workers = 8;
SET maintenance_work_mem = '2GB';Fallback behavior
Without ParadeDB, text/regex search uses Postgres ILIKE (accelerated by
pg_trgm GIN indexes) and ~*. Without pg_trgm, text search falls back to a
sequential scan — usable for small indexes but slow at scale.
Testing
EXOGRAPH_DATABASE_URL=postgres://postgres:postgres@localhost:5432/exograph_test \
mix test