Downloads and indexes Hex.pm packages into a DuckDB/QuackDB-backed Exograph index by default.
mix exograph.index.hex
mix exograph.index.hex --mode top --limit 5000
mix exograph.index.hex --mode latest --concurrency 8
mix exograph.index.hex --mode latest --web --port 4200Packages are downloaded as tarballs, extracted to a temp directory, indexed,
then cleaned up. Peak disk usage is proportional to --concurrency, not the
total number of packages.
Already-indexed packages (by name+version) are skipped by default.
Use --force to re-index everything.
Options
--mode-latest(default),top, orall--limit- max packages to index--entries-file- JSON report or NDJSON file withnameandversionentries to index--entries-output-path- write the resolved entry list as NDJSON for reproducible reruns--prefix- table prefix (default:hex)--concurrency- global download+index worker target (default:4)--package-batch-size- packages to extract and flush together per worker (default:1)--shard-concurrency- workers per DuckDB shard (default:ceil(concurrency / duckdb_shards))--shard-pool-size- DB connections per DuckDB shard (default: shard concurrency)--pipeline-task(default) orbroadway--duckdb-shards- opt-in shard count for DuckDB corpus indexing. Sharding can improve large-corpus ingestion, but it has shard-local global search/count semantics; seeguides/sharded-duckdb.md.--duckdb-threads- DuckDB execution threads per shard/server--duckdb-memory-limit- DuckDB memory limit per shard/server, e.g.2GB--duckdb-queue-target- DBConnection queue target in milliseconds for DuckDB shard repos (default:60000)--duckdb-queue-interval- DBConnection queue interval in milliseconds for DuckDB shard repos (default:120000)--duckdb-recovery-mode- DuckDB managed-server recovery mode (no_wal_writesfor rebuildable indexes)--duckdb-build-mode- DuckDB corpus build strategy:online(default) or experimentalofflinemetadata flag--duckdb-fragment-append- DuckDB fragment insert strategy:merge(default) orecto--duckdb-insert-buffer-size- buffered DuckDB fact rows per table before flushing (default:50000)--manifest-path- write a sharded DuckDB manifest to this path formix exograph.web --manifest-path ...--report-path- write indexing totals and failures as JSON--timings-path- write stage timing totals as JSON--missing-tarballs-report-path- write missing local tarballs as JSON when--tarball-diris set--retry-count- retry transient per-package failures this many times (default:3)--retry-sleep- base retry sleep in milliseconds (default:1000)--shard-dir- directory for managed DuckDB shard files--min-mass- minimum fragment AST mass (default:8)--generated-min-mass- minimum fragment AST mass for generated files (default:16for DuckDB,8for Postgres)--reach- include Reach call graph extraction--force- re-index already-indexed packages--no-bm25- skip ParadeDB BM25 index creation--mirror- tarball mirror URL (repeatable)--registry-url- Hex registry URL forversions,latest, andallmodes--api-url- Hex package API URL fortopmode--cache-tarballs- directory to cache downloaded tarballs--tarball-dir- local directory of Hex tarballs; bypasses HTTP tarball downloads--backend-duckdb(default) orpostgres--database-url- Postgres URL (or setEXOGRAPH_DATABASE_URL)--postgres-maintenance-work-mem- session-local maintenance_work_mem during Postgres index builds--postgres-max-parallel-maintenance-workers- session-local max_parallel_maintenance_workers during Postgres index builds--postgres-unlogged- use UNLOGGED Postgres tables for rebuildable local indexes--postgres-defer-indexes- build non-unique Postgres query indexes after corpus loading--postgres-copy- use Postgres COPY for supported high-volume append tables--quackdb-uri- QuackDB URI for DuckDB backend (or setQUACKDB_URI/QUACKDB_TEST_URI)--quackdb-token- QuackDB token for DuckDB backend (or setQUACKDB_TOKEN/QUACKDB_TEST_TOKEN)--duckdb-database- managed DuckDB database path when--quackdb-uriis omitted--repo- Ecto repo module (uses built-in if omitted)--timeout- per-package timeout in seconds (default:300)--duckdb-fragment-payload-metrics- record approximate per-column fragment append payload metrics--web- start web UI with live progress dashboard--port- web UI port (default:4200, requires--web)