ProgramFacts generates Elixir projects with ground-truth static-analysis facts for testing analyzers, refactoring tools, compilers, and code intelligence systems.
In this roadmap, “program facts” means machine-checkable facts about source code: modules, functions, call edges, call paths, data-flow facts, effect facts, branch facts, architecture facts, project layout facts, and source locations where practical. These are oracle facts: analyzers should rediscover them from the generated source.
The package should not merely generate syntactically valid Elixir. It should generate source code plus ground truth.
Principles
- Generate from a semantic model, not random strings.
- Keep generated programs valid by construction.
- Make every generated program reproducible with a seed.
- Return both source files and expected oracle facts.
- Keep Reach-specific assertions outside this package.
- Start with a small Elixir subset and expand only when tests need it.
Current status
Completed package-side work through the original roadmap, excluding Reach integration.
- Phase 0: complete.
- Phase 1: complete.
- Phase 2: complete for planned Elixir policies.
- Phase 3: complete for planned branch policies.
- Phase 4: complete for planned effect policies.
- Phase 5: complete for package-side architecture fixtures.
- Phase 6: complete for Elixir layouts; Erlang layouts remain future work.
- Phase 7: complete for the planned initial transform set.
- Phase 8: feedback-directed feature search implemented with scoring/interesting callbacks.
- Phase 9: corpus persistence, manifest loading, failure promotion, and replay helpers implemented.
- Phase 10: model-first generation implemented for built-in policies; policy modules build
ProgramFacts.Modelvalues andProgramFacts.Model.to_program/1derivesProgramFacts.Facts. - Graph adapter: optional
libgraphintegration implemented throughProgramFacts.Graphfor call graphs, module graphs, path validation, reachability, cycle checks, graph metrics, and subgraph extraction. - Phase 11: option shrinker, transform-sequence minimization, and initial structural module/file minimization implemented.
- Phase 12: analyzer feedback callbacks and graph-backed scoring modes implemented.
- Phase 13: transform invariant comparison implemented.
- Phase 14: OTP/GenServer plus initial richer Elixir syntax fixtures implemented.
- Phase 15: differential analyzer callback comparison and adapter/result normalization implemented.
- Typed manifest boundary:
%ProgramFacts.Manifest{},%ProgramFacts.Manifest.Facts{},%ProgramFacts.Manifest.File{}, and%ProgramFacts.Fact.*{}payloads implemented with JSON protocol encoding/decoding. - Static quality checks: GitHub Actions and
mix ciinclude compile warnings-as-errors, format, Credo strict, ExDNA, Dialyzer, ExSlop, and tests. - Reach integration: implemented in Reach test/dev validation.
Public API
program =
ProgramFacts.generate!(
policy: :linear_call_chain,
seed: 123,
depth: 4
)
program.files
program.facts.call_edges
program.facts.call_paths
ProgramFacts.model(program)
ProgramFacts.to_json!(program){:ok, dir, program} =
ProgramFacts.Project.write_tmp!(
policy: :straight_line_data_flow,
seed: 42
)Data model
%ProgramFacts.Program{
id: "pf_123_linear_call_chain",
seed: 123,
files: [%ProgramFacts.File{}],
facts: %ProgramFacts.Facts{},
metadata: %{}
}Facts include:
modulesfunctionscall_edgescall_pathsdata_flowseffectsbranchesarchitecturelocationsfeatures
Phase 0 — package bootstrap
Status: complete.
Implemented:
- Mix project
- CI alias
- Formatter
- README
- Roadmap
- Basic package metadata
- Deterministic generation API
- Hex build verification
- Credo strict
- ExDNA
- Dialyzer
- ExUnit tests
Phase 1 — call graph generator
Status: complete.
Implemented policies:
:single_call:linear_call_chain:branching_call_graph:module_dependency_chain:module_cycle
Facts:
- modules
- functions
- call edges
- call paths
- cycle architecture fact for
:module_cycle
Reach integration target remains future work:
mix reach.inspect Generated.A.entry/1 --why Generated.C.sink/1 --format json
Phase 2 — data-flow generator
Status: complete for planned Elixir policies.
Implemented policies:
:straight_line_data_flow:assignment_chain:branch_data_flow:helper_call_data_flow:pipeline_data_flow:return_data_flow
Facts:
- parameter-to-variable flow
- variable-to-call-argument flow
- helper argument-to-return flow
- branch data-flow descriptors
- return data-flow descriptors
- source/sink descriptors
Reach integration targets remain future work:
mix reach.trace --from input --to sink --format json
mix reach.map --data --format json
Phase 3 — branch/control-flow generator
Status: complete for planned branch policies.
Implemented policies:
:if_else:case_clauses:cond_branches:with_chain:multi_clause_function:anonymous_fn_branch:nested_branches
Facts:
- branch kind
- clause count
- clause labels
- nested branch descriptors
- calls by clause
- call edges
- call paths
Phase 4 — effect generator
Status: complete for planned effect policies.
Implemented policies:
:pure:io_effect:send_effect:raise_effect:read_effect:write_effect:mixed_effect_boundary
Targets remain future Reach integration work:
mix reach.map --effects --format json
mix reach.check --candidates --format json
Phase 5 — architecture/policy generator
Status: package-side fixtures implemented.
Implemented policies:
:layered_valid:forbidden_dependency:layer_cycle:public_api_boundary_violation:internal_boundary_violation:allowed_effect_violation
Generated projects include .reach.exs fixtures and architecture facts. Reach validation remains future work.
Phase 6 — project layout generator
Status: complete for Elixir layouts.
Implemented layouts:
lib/**/*.exvia:plainapps/*/lib/**/*.exvia:umbrella*/lib/**/*.exvia:package_style- generated
deps/excluded fixture files - generated
_build/excluded fixture files - layout-aware generated
mix.exswithelixirc_paths
Future work:
src/**/*.erlapps/*/src/**/*.erl*/src/**/*.erl
Phase 7 — metamorphic transformations
Status: complete for planned initial transform set.
Implemented transforms:
:rename_variables:add_dead_pure_statement:add_dead_branch:extract_helper:inline_helper:wrap_in_if_true:wrap_in_case_identity:reorder_independent_assignments:split_module_files:add_unrelated_module:add_alias_and_rewrite_remote_call
All source transforms are AST-based. No library code rewrites Elixir source with regex.
Phase 8 — feedback-directed generation
Status: initial implementation complete.
Implemented:
ProgramFacts.Search.run(iterations: 50, seed: 100)The search keeps programs that add new feature coverage and reports feature/program counts.
Phase 9 — corpus management
Status: initial implementation complete.
Implemented:
ProgramFacts.Corpus.save!(program, root)
ProgramFacts.Corpus.manifests(root)
ProgramFacts.Corpus.load_manifest!(dir)
ProgramFacts.Corpus.load_manifests!(root)Corpus entries include:
program_facts.json
mix.exs
lib/generated/...program_facts.json includes schema_version, program_facts_version, policy, layout, files, metadata, and facts.
Fuzzing roadmap
The initial motivation was fuzz/property testing for Reach and other Elixir analyzers. Research into Csmith, YARPGen, QuickChick, FuzzChick, EMI/Orion, NAUTILUS, Gramatron, GRIMOIRE, GLADE, Athena, and Hermes led to one core decision: analyzer tests need generated programs with known facts, not arbitrary random strings.
ProgramFacts is therefore a structural-oracle generator first, and a fuzzing engine second. The next phases move it closer to mature fuzzing workflows while preserving source-plus-ground-truth-facts as the core value.
Phase 10 — model-first generation
Goal: move from policy templates that project into a model toward a semantic model as the source of truth.
Tasks:
- Add explicit model builders for modules, functions, calls, data flows, effects, branches, and architecture facts.
- Render source from the model.
- Derive facts from the model rather than maintaining source/facts by hand.
- Keep policy generators as model constructors.
- Support multiple renderers from the same model over time.
Phase 11 — shrinking and minimization
Goal: make generated failures easy to reduce.
Tasks:
- Add
ProgramFacts.Shrink. - Reduce
depthandwidthwhile a failure predicate still fails. - Try simpler layouts.
- Remove unrelated modules/files while preserving the failure.
- Minimize transform sequences.
- Return a replayable minimized program and shrink trace.
Phase 12 — analyzer feedback loop
Goal: support feedback-directed generation instead of only feature coverage.
Tasks:
- Extend
ProgramFacts.Search.run/1with:score,:interesting?, and:on_candidatecallbacks. - Track crashes, mismatches, new analyzer coverage, slow cases, and feature novelty.
- Keep corpus-worthy programs automatically.
- Support deterministic replay of interesting seeds.
Phase 13 — metamorphic properties
Goal: make transforms testable as equivalence/near-equivalence claims.
Tasks:
- Add transform invariant metadata.
- Record which facts should be preserved and which facts may change.
- Provide helpers to compare original/transformed facts.
- Support EMI-style equivalent variants such as wrapping in
if true, identity cases, alias rewrites, helper extraction/inlining, and independent assignment reordering.
Phase 14 — richer Elixir subset
Goal: broaden generated Elixir while keeping known facts.
Tasks:
- Add guards.
- Add
try/rescue/after. - Add
receive. - Add comprehensions.
- Add protocols.
- Add structs and nested updates.
- Add default arguments.
- Add alias/import/require combinations.
- Add macro-generated functions.
- Add OTP/GenServer modules.
- Add Phoenix/Ecto-style DSL fixtures.
- Add Erlang source layouts.
Phase 15 — differential testing
Goal: compare analyzers or analyzer versions.
Tasks:
- Compare Reach source frontend vs BEAM frontend.
- Compare current Reach vs previous release.
- Compare canonical CLI JSON vs internal APIs.
- Allow users to register multiple analyzer adapters.
- Save disagreement repros to corpus.
Phase 16 — corpus promotion
Goal: turn generated failures into stable regression fixtures.
Tasks:
- Promote minimized failures into named corpus entries.
- Store failure metadata, analyzer command, expected mismatch, and minimized seed/options.
- Add replay helpers that run analyzers against saved corpus entries.
- Support CI-friendly corpus subsets.
Remaining work
- Keep expanding Reach integration coverage as ProgramFacts grows.
- Keep enriching model-first generation with more renderer backends.
- Expand
ProgramFacts.Graphfor analyzer differential comparisons. - More powerful shrinking/minimization: remove branches/edges and use source-aware structural reductions beyond isolated modules.
- Erlang source layout generation.
- Broader Elixir syntax: protocols, macros, richer alias/import/require combinations, Phoenix/Ecto-style DSL fixtures, and deeper variants of guards, try/rescue/after, receive, comprehensions, structs, and default args.
- Richer source locations for nested/generated constructs and macro-expanded code.
- Analyzer coverage-guided search adapters.
- Richer metamorphic transform invariant specifications.
- Differential testing adapters for real analyzers and version comparisons, built on
ProgramFacts.Analyzer.