ProgramFacts Roadmap

Copy Markdown View Source

ProgramFacts generates Elixir projects with ground-truth static-analysis facts for testing analyzers, refactoring tools, compilers, and code intelligence systems.

In this roadmap, “program facts” means machine-checkable facts about source code: modules, functions, call edges, call paths, data-flow facts, effect facts, branch facts, architecture facts, project layout facts, and source locations where practical. These are oracle facts: analyzers should rediscover them from the generated source.

The package should not merely generate syntactically valid Elixir. It should generate source code plus ground truth.

Principles

  • Generate from a semantic model, not random strings.
  • Keep generated programs valid by construction.
  • Make every generated program reproducible with a seed.
  • Return both source files and expected oracle facts.
  • Keep Reach-specific assertions outside this package.
  • Start with a small Elixir subset and expand only when tests need it.

Current status

Completed package-side work through the original roadmap, excluding Reach integration.

  • Phase 0: complete.
  • Phase 1: complete.
  • Phase 2: complete for planned Elixir policies.
  • Phase 3: complete for planned branch policies.
  • Phase 4: complete for planned effect policies.
  • Phase 5: complete for package-side architecture fixtures.
  • Phase 6: complete for Elixir layouts; Erlang layouts remain future work.
  • Phase 7: complete for the planned initial transform set.
  • Phase 8: feedback-directed feature search implemented with scoring/interesting callbacks.
  • Phase 9: corpus persistence, manifest loading, failure promotion, and replay helpers implemented.
  • Phase 10: model-first generation implemented for built-in policies; policy modules build ProgramFacts.Model values and ProgramFacts.Model.to_program/1 derives ProgramFacts.Facts.
  • Graph adapter: optional libgraph integration implemented through ProgramFacts.Graph for call graphs, module graphs, path validation, reachability, cycle checks, graph metrics, and subgraph extraction.
  • Phase 11: option shrinker, transform-sequence minimization, and initial structural module/file minimization implemented.
  • Phase 12: analyzer feedback callbacks and graph-backed scoring modes implemented.
  • Phase 13: transform invariant comparison implemented.
  • Phase 14: OTP/GenServer plus initial richer Elixir syntax fixtures implemented.
  • Phase 15: differential analyzer callback comparison and adapter/result normalization implemented.
  • Typed manifest boundary: %ProgramFacts.Manifest{}, %ProgramFacts.Manifest.Facts{}, %ProgramFacts.Manifest.File{}, and %ProgramFacts.Fact.*{} payloads implemented with JSON protocol encoding/decoding.
  • Static quality checks: GitHub Actions and mix ci include compile warnings-as-errors, format, Credo strict, ExDNA, Dialyzer, ExSlop, and tests.
  • Reach integration: implemented in Reach test/dev validation.

Public API

program =
  ProgramFacts.generate!(
    policy: :linear_call_chain,
    seed: 123,
    depth: 4
  )

program.files
program.facts.call_edges
program.facts.call_paths
ProgramFacts.model(program)
ProgramFacts.to_json!(program)
{:ok, dir, program} =
  ProgramFacts.Project.write_tmp!(
    policy: :straight_line_data_flow,
    seed: 42
  )

Data model

%ProgramFacts.Program{
  id: "pf_123_linear_call_chain",
  seed: 123,
  files: [%ProgramFacts.File{}],
  facts: %ProgramFacts.Facts{},
  metadata: %{}
}

Facts include:

  • modules
  • functions
  • call_edges
  • call_paths
  • data_flows
  • effects
  • branches
  • architecture
  • locations
  • features

Phase 0 — package bootstrap

Status: complete.

Implemented:

  • Mix project
  • CI alias
  • Formatter
  • README
  • Roadmap
  • Basic package metadata
  • Deterministic generation API
  • Hex build verification
  • Credo strict
  • ExDNA
  • Dialyzer
  • ExUnit tests

Phase 1 — call graph generator

Status: complete.

Implemented policies:

  • :single_call
  • :linear_call_chain
  • :branching_call_graph
  • :module_dependency_chain
  • :module_cycle

Facts:

  • modules
  • functions
  • call edges
  • call paths
  • cycle architecture fact for :module_cycle

Reach integration target remains future work:

mix reach.inspect Generated.A.entry/1 --why Generated.C.sink/1 --format json

Phase 2 — data-flow generator

Status: complete for planned Elixir policies.

Implemented policies:

  • :straight_line_data_flow
  • :assignment_chain
  • :branch_data_flow
  • :helper_call_data_flow
  • :pipeline_data_flow
  • :return_data_flow

Facts:

  • parameter-to-variable flow
  • variable-to-call-argument flow
  • helper argument-to-return flow
  • branch data-flow descriptors
  • return data-flow descriptors
  • source/sink descriptors

Reach integration targets remain future work:

mix reach.trace --from input --to sink --format json
mix reach.map --data --format json

Phase 3 — branch/control-flow generator

Status: complete for planned branch policies.

Implemented policies:

  • :if_else
  • :case_clauses
  • :cond_branches
  • :with_chain
  • :multi_clause_function
  • :anonymous_fn_branch
  • :nested_branches

Facts:

  • branch kind
  • clause count
  • clause labels
  • nested branch descriptors
  • calls by clause
  • call edges
  • call paths

Phase 4 — effect generator

Status: complete for planned effect policies.

Implemented policies:

  • :pure
  • :io_effect
  • :send_effect
  • :raise_effect
  • :read_effect
  • :write_effect
  • :mixed_effect_boundary

Targets remain future Reach integration work:

mix reach.map --effects --format json
mix reach.check --candidates --format json

Phase 5 — architecture/policy generator

Status: package-side fixtures implemented.

Implemented policies:

  • :layered_valid
  • :forbidden_dependency
  • :layer_cycle
  • :public_api_boundary_violation
  • :internal_boundary_violation
  • :allowed_effect_violation

Generated projects include .reach.exs fixtures and architecture facts. Reach validation remains future work.

Phase 6 — project layout generator

Status: complete for Elixir layouts.

Implemented layouts:

  • lib/**/*.ex via :plain
  • apps/*/lib/**/*.ex via :umbrella
  • */lib/**/*.ex via :package_style
  • generated deps/ excluded fixture files
  • generated _build/ excluded fixture files
  • layout-aware generated mix.exs with elixirc_paths

Future work:

  • src/**/*.erl
  • apps/*/src/**/*.erl
  • */src/**/*.erl

Phase 7 — metamorphic transformations

Status: complete for planned initial transform set.

Implemented transforms:

  • :rename_variables
  • :add_dead_pure_statement
  • :add_dead_branch
  • :extract_helper
  • :inline_helper
  • :wrap_in_if_true
  • :wrap_in_case_identity
  • :reorder_independent_assignments
  • :split_module_files
  • :add_unrelated_module
  • :add_alias_and_rewrite_remote_call

All source transforms are AST-based. No library code rewrites Elixir source with regex.

Phase 8 — feedback-directed generation

Status: initial implementation complete.

Implemented:

ProgramFacts.Search.run(iterations: 50, seed: 100)

The search keeps programs that add new feature coverage and reports feature/program counts.

Phase 9 — corpus management

Status: initial implementation complete.

Implemented:

ProgramFacts.Corpus.save!(program, root)
ProgramFacts.Corpus.manifests(root)
ProgramFacts.Corpus.load_manifest!(dir)
ProgramFacts.Corpus.load_manifests!(root)

Corpus entries include:

program_facts.json
mix.exs
lib/generated/...

program_facts.json includes schema_version, program_facts_version, policy, layout, files, metadata, and facts.

Fuzzing roadmap

The initial motivation was fuzz/property testing for Reach and other Elixir analyzers. Research into Csmith, YARPGen, QuickChick, FuzzChick, EMI/Orion, NAUTILUS, Gramatron, GRIMOIRE, GLADE, Athena, and Hermes led to one core decision: analyzer tests need generated programs with known facts, not arbitrary random strings.

ProgramFacts is therefore a structural-oracle generator first, and a fuzzing engine second. The next phases move it closer to mature fuzzing workflows while preserving source-plus-ground-truth-facts as the core value.

Phase 10 — model-first generation

Goal: move from policy templates that project into a model toward a semantic model as the source of truth.

Tasks:

  • Add explicit model builders for modules, functions, calls, data flows, effects, branches, and architecture facts.
  • Render source from the model.
  • Derive facts from the model rather than maintaining source/facts by hand.
  • Keep policy generators as model constructors.
  • Support multiple renderers from the same model over time.

Phase 11 — shrinking and minimization

Goal: make generated failures easy to reduce.

Tasks:

  • Add ProgramFacts.Shrink.
  • Reduce depth and width while a failure predicate still fails.
  • Try simpler layouts.
  • Remove unrelated modules/files while preserving the failure.
  • Minimize transform sequences.
  • Return a replayable minimized program and shrink trace.

Phase 12 — analyzer feedback loop

Goal: support feedback-directed generation instead of only feature coverage.

Tasks:

  • Extend ProgramFacts.Search.run/1 with :score, :interesting?, and :on_candidate callbacks.
  • Track crashes, mismatches, new analyzer coverage, slow cases, and feature novelty.
  • Keep corpus-worthy programs automatically.
  • Support deterministic replay of interesting seeds.

Phase 13 — metamorphic properties

Goal: make transforms testable as equivalence/near-equivalence claims.

Tasks:

  • Add transform invariant metadata.
  • Record which facts should be preserved and which facts may change.
  • Provide helpers to compare original/transformed facts.
  • Support EMI-style equivalent variants such as wrapping in if true, identity cases, alias rewrites, helper extraction/inlining, and independent assignment reordering.

Phase 14 — richer Elixir subset

Goal: broaden generated Elixir while keeping known facts.

Tasks:

  • Add guards.
  • Add try/rescue/after.
  • Add receive.
  • Add comprehensions.
  • Add protocols.
  • Add structs and nested updates.
  • Add default arguments.
  • Add alias/import/require combinations.
  • Add macro-generated functions.
  • Add OTP/GenServer modules.
  • Add Phoenix/Ecto-style DSL fixtures.
  • Add Erlang source layouts.

Phase 15 — differential testing

Goal: compare analyzers or analyzer versions.

Tasks:

  • Compare Reach source frontend vs BEAM frontend.
  • Compare current Reach vs previous release.
  • Compare canonical CLI JSON vs internal APIs.
  • Allow users to register multiple analyzer adapters.
  • Save disagreement repros to corpus.

Phase 16 — corpus promotion

Goal: turn generated failures into stable regression fixtures.

Tasks:

  • Promote minimized failures into named corpus entries.
  • Store failure metadata, analyzer command, expected mismatch, and minimized seed/options.
  • Add replay helpers that run analyzers against saved corpus entries.
  • Support CI-friendly corpus subsets.

Remaining work

  • Keep expanding Reach integration coverage as ProgramFacts grows.
  • Keep enriching model-first generation with more renderer backends.
  • Expand ProgramFacts.Graph for analyzer differential comparisons.
  • More powerful shrinking/minimization: remove branches/edges and use source-aware structural reductions beyond isolated modules.
  • Erlang source layout generation.
  • Broader Elixir syntax: protocols, macros, richer alias/import/require combinations, Phoenix/Ecto-style DSL fixtures, and deeper variants of guards, try/rescue/after, receive, comprehensions, structs, and default args.
  • Richer source locations for nested/generated constructs and macro-expanded code.
  • Analyzer coverage-guided search adapters.
  • Richer metamorphic transform invariant specifications.
  • Differential testing adapters for real analyzers and version comparisons, built on ProgramFacts.Analyzer.