Credence.Corpus (credence v0.8.0)

Copy Markdown

Real-world corpus of popular hex packages and beefy application repos, used by the over-firing test layer (test/corpus/over_firing_test.exs) and the mix credence.corpus / mix credence.corpus.fetch maintainer tasks.

The premise: this is widely used, well-reviewed Elixir code, so Credence should find nothing to flag. Anything it does flag is a candidate over-fire (a false positive on idiomatic code) — unless it is a reviewed, genuinely-correct suggestion (see the allowlist in the test).

Two sources, both into a gitignored corpus/ directory (source only, no deps, no compilation — the Pattern phase is parse-only):

  • @packages — hex libraries, fetched with mix hex.package fetch. Versions are immutable, so the cache is reproducible.
  • @repos — large real-world application repos (Supabase's supavisor, Livebook, Plausible, Blockscout, the Elixir language itself, …), shallow-cloned at an exact commit SHA so line numbers — and thus the snapshot — stay reproducible.

Summary

Functions

Local cache directory for a package's unpacked source.

Fetches every pinned hex package and git repo into corpus/ if missing or at the wrong version/SHA. Idempotent — a no-op once the cache is warm, so the test loop and the report task pay the network cost only once.

Every corpus entry as {name, label} — hex packages labelled by version, git repos by short SHA. The unit of iteration for the corpus tests and report.

True if the entry's source is present in corpus/.

Every Elixir source file of a fetched entry: any lib/**/*.ex at any depth — a hex package's top-level lib/, an umbrella's apps/*/lib/, or a multi-app repo's <sub>/lib/ (e.g. grpc's grpc/lib, grpc_core/lib). Non-production trees (test/, deps/, _build/, JS node_modules/) are excluded, since the over-fire premise is well-reviewed production code.

The pinned {package, version} list (hex libraries only).

The pinned {name, git_url, sha} list (beefy application repos).

Root directory the corpus is unpacked into (gitignored).

Functions

dir(pkg)

@spec dir(atom()) :: String.t()

Local cache directory for a package's unpacked source.

ensure_fetched!()

@spec ensure_fetched!() :: :ok

Fetches every pinned hex package and git repo into corpus/ if missing or at the wrong version/SHA. Idempotent — a no-op once the cache is warm, so the test loop and the report task pay the network cost only once.

entries()

@spec entries() :: [{atom(), String.t()}]

Every corpus entry as {name, label} — hex packages labelled by version, git repos by short SHA. The unit of iteration for the corpus tests and report.

fetched?(name)

@spec fetched?(atom()) :: boolean()

True if the entry's source is present in corpus/.

lib_files(name)

@spec lib_files(atom()) :: [String.t()]

Every Elixir source file of a fetched entry: any lib/**/*.ex at any depth — a hex package's top-level lib/, an umbrella's apps/*/lib/, or a multi-app repo's <sub>/lib/ (e.g. grpc's grpc/lib, grpc_core/lib). Non-production trees (test/, deps/, _build/, JS node_modules/) are excluded, since the over-fire premise is well-reviewed production code.

packages()

@spec packages() :: [{atom(), String.t()}]

The pinned {package, version} list (hex libraries only).

repos()

@spec repos() :: [{atom(), String.t(), String.t()}]

The pinned {name, git_url, sha} list (beefy application repos).

root()

@spec root() :: String.t()

Root directory the corpus is unpacked into (gitignored).