mix exograph.index.hex (exograph v0.8.1)

Copy Markdown View Source

Downloads and indexes Hex.pm packages into a DuckDB/QuackDB-backed Exograph index by default.

mix exograph.index.hex
mix exograph.index.hex --mode top --limit 5000
mix exograph.index.hex --mode latest --concurrency 8
mix exograph.index.hex --mode latest --web --port 4200

Packages are downloaded as tarballs, extracted to a temp directory, indexed, then cleaned up. Peak disk usage is proportional to --concurrency, not the total number of packages.

Already-indexed packages (by name+version) are skipped by default. Use --force to re-index everything.

Options

  • --mode - latest (default), top, or all
  • --limit - max packages to index
  • --entries-file - JSON report or NDJSON file with name and version entries to index
  • --entries-output-path - write the resolved entry list as NDJSON for reproducible reruns
  • --prefix - table prefix (default: hex)
  • --concurrency - global download+index worker target (default: 4)
  • --package-batch-size - packages to extract and flush together per worker (default: 1)
  • --shard-concurrency - workers per DuckDB shard (default: ceil(concurrency / duckdb_shards))
  • --shard-pool-size - DB connections per DuckDB shard (default: shard concurrency)
  • --pipeline - task (default) or broadway
  • --duckdb-shards - opt-in shard count for DuckDB corpus indexing. Sharding can improve large-corpus ingestion, but it has shard-local global search/count semantics; see guides/sharded-duckdb.md.
  • --duckdb-threads - DuckDB execution threads per shard/server
  • --duckdb-memory-limit - DuckDB memory limit per shard/server, e.g. 2GB
  • --duckdb-queue-target - DBConnection queue target in milliseconds for DuckDB shard repos (default: 60000)
  • --duckdb-queue-interval - DBConnection queue interval in milliseconds for DuckDB shard repos (default: 120000)
  • --duckdb-recovery-mode - DuckDB managed-server recovery mode (no_wal_writes for rebuildable indexes)
  • --duckdb-build-mode - DuckDB corpus build strategy: online (default) or experimental offline metadata flag
  • --duckdb-fragment-append - DuckDB fragment insert strategy: merge (default) or ecto
  • --duckdb-insert-buffer-size - buffered DuckDB fact rows per table before flushing (default: 50000)
  • --manifest-path - write a sharded DuckDB manifest to this path for mix exograph.web --manifest-path ...
  • --report-path - write indexing totals and failures as JSON
  • --timings-path - write stage timing totals as JSON
  • --missing-tarballs-report-path - write missing local tarballs as JSON when --tarball-dir is set
  • --retry-count - retry transient per-package failures this many times (default: 3)
  • --retry-sleep - base retry sleep in milliseconds (default: 1000)
  • --shard-dir - directory for managed DuckDB shard files
  • --min-mass - minimum fragment AST mass (default: 8)
  • --generated-min-mass - minimum fragment AST mass for generated files (default: 16 for DuckDB, 8 for Postgres)
  • --reach - include Reach call graph extraction
  • --force - re-index already-indexed packages
  • --no-bm25 - skip ParadeDB BM25 index creation
  • --mirror - tarball mirror URL (repeatable)
  • --registry-url - Hex registry URL for versions, latest, and all modes
  • --api-url - Hex package API URL for top mode
  • --cache-tarballs - directory to cache downloaded tarballs
  • --tarball-dir - local directory of Hex tarballs; bypasses HTTP tarball downloads
  • --backend - duckdb (default) or postgres
  • --database-url - Postgres URL (or set EXOGRAPH_DATABASE_URL)
  • --postgres-maintenance-work-mem - session-local maintenance_work_mem during Postgres index builds
  • --postgres-max-parallel-maintenance-workers - session-local max_parallel_maintenance_workers during Postgres index builds
  • --postgres-unlogged - use UNLOGGED Postgres tables for rebuildable local indexes
  • --postgres-defer-indexes - build non-unique Postgres query indexes after corpus loading
  • --postgres-copy - use Postgres COPY for supported high-volume append tables
  • --quackdb-uri - QuackDB URI for DuckDB backend (or set QUACKDB_URI / QUACKDB_TEST_URI)
  • --quackdb-token - QuackDB token for DuckDB backend (or set QUACKDB_TOKEN / QUACKDB_TEST_TOKEN)
  • --duckdb-database - managed DuckDB database path when --quackdb-uri is omitted
  • --repo - Ecto repo module (uses built-in if omitted)
  • --timeout - per-package timeout in seconds (default: 300)
  • --duckdb-fragment-payload-metrics - record approximate per-column fragment append payload metrics
  • --web - start web UI with live progress dashboard
  • --port - web UI port (default: 4200, requires --web)