These are local Exograph backend benchmarks for the Hex.pm top workload. They are intended to compare Exograph's current backend implementations on this machine, not to make a universal claim about PostgreSQL or DuckDB.

Method

All runs used the same package cache and full Exograph persistence: files, fragments, ASTs, hashes, symbols, references, terms, and queryable facts were retained.

Common settings:

--mode top
--runs 3
--concurrency 4
--index-concurrency 4
--duckdb-threads 1
--postgres-defer-indexes
--postgres-synchronous-commit off
--postgres-maintenance-work-mem 512MB
--postgres-max-parallel-maintenance-workers 2

DuckDB sharded runs used:

--duckdb-shards 4 --duckdb-recovery-mode no_wal_writes

Postgres settings are a rebuildable/local-index challenge mode: deferred non-unique query indexes, synchronous_commit=off, and larger maintenance memory. They are not durable-production defaults.

Indexing medians

WorkloadPostgres plainDuckDB plainDuckDB sharded plainResult
top --limit 10038.42s39.22s41.17stuned Postgres slightly faster
top --limit 500181.22s109.27s91.05sDuckDB 1.66× faster; sharded DuckDB 1.99× faster

For limit 100, the systems are close and tuned Postgres wins indexing. For limit 500, DuckDB wins indexing, and sharding improves throughput further.

Query medians

top --limit 100

QueryPostgres plainDuckDB plainDuckDB sharded plain
api_text_defmodule71.1ms27.7ms45.4ms
references_enum24.1ms2.3ms3.3ms
files_defmodule31.0ms7.3ms2.4ms
api_comments_todo134.3ms129.8ms65.6ms

top --limit 500

QueryPostgres plainDuckDB plainDuckDB sharded plain
api_text_defmodule134.2ms49.0ms37.0ms
references_enum56.7ms8.9ms9.9ms
files_defmodule98.3ms23.6ms7.1ms
api_comments_todo143.3ms159.1ms199.9ms

Search/query paths usually favor DuckDB materially, especially on the larger workload.

Artifacts

Machine-readable benchmark artifacts are generated locally and intentionally not committed to git. Use --output-json for the JSON report and --explain-dir for Postgres plans, for example:

mix exograph.bench.backends \
  --mode top --limit 500 --runs 3 \
  --only postgres_plain,duckdb_plain,duckdb_sharded_plain \
  --output-json bench-results/backend-limit500-runs3-current.json \
  --explain-dir bench-results/explain-limit500-runs3-current

bench-results/ is gitignored to avoid polluting the repository with local benchmark outputs. The tables above record the latest checked benchmark summary; regenerate artifacts locally when you need machine-readable evidence or plans.

Current fair wording

A defensible summary is:

On Exograph's Hex.pm top-package workload, tuned Postgres is slightly faster at indexing 100 packages. At 500 packages, DuckDB indexes about 1.66× faster single-node and about 1.99× faster with 4 shards, while several important query paths are roughly 3×–14× faster on DuckDB. These numbers describe Exograph's current backends and local benchmark setup, not PostgreSQL or DuckDB universally.