Performance

View Source

< Quick Reference | Up: Introduction | Index | State, Reader & Writer >

Skuld achieves near-parity with hand-written evidence-passing CPS. Discounting BEAM try/catch overhead — irreducible and shared with all Elixir applications — the additional framework overhead is minimal.

Per-effect overhead

Progressive benchmarks starting from a bare CPS baseline and adding Skuld features one at a time (N=10,000, MIX_ENV=prod):

StepFeaturevs catch baseline
S1+ first catch frame1.0x (baseline)
S3+ 2 more catches (3 total)1.25x
S6+ struct operation args1.68x
S10Full Skuld (all features)~2.1–2.3x

The largest single factor is BEAM try/catch frames (~0.163 µs/op for the first frame), not Skuld's own machinery. Three catch frames per loop iteration (call, bind, handler dispatch) account for most of the overhead relative to raw CPS.

Inline total+linear path

Effects implementing ITotalLinearHandler get an inline continuation path in comp do blocks:

# State is a total+linear handler — the comp macro emits inline
# continuations, skipping Comp.bind/2 closure allocation
comp do
  x <- State.get()
  y <- State.put(x + 1)
  y * 2
end

At N=10,000 this path is 1.5–1.8× faster than the nested-bind path and matches raw CPS within noise.

Per-tag operation optimisation

Tagged effects (e.g. State.get(:counter)) use per-tag module-atom signatures. The handler captures a precomputed state key at installation time, eliminating per-operation key computation and struct allocation:

MetricBefore (struct ops)After (per-tag sig)
S10 Full Skuld4.2–5.5x vs S02.6–2.7x vs S0

The improvement collapses multiple overhead sources simultaneously: struct args, Change struct allocation, and state key computation are all reduced by the handler closing over precomputed values.

Data fetch benchmarks

Query batching benchmarks show FiberPool overhead at ~0.25 µs/op per fiber. For realistic workloads (database queries at ~1 ms), the overhead is noise. Automatic N+1 batching provides order-of-magnitude speedups for multi-fetch operations.

Where the overhead lives

The remaining vs-catch-baseline overhead (~2.1–2.3x) is composed of:

  • Additional catch frames (~1.25x) — irreducible BEAM tax
  • Struct argument allocation (~1.30x) — reduced by per-tag sigs
  • Tuple state keys (~1.16x) — eliminated by per-tag sigs
  • Env struct size and scoped machinery — negligible after inlining

Optimisation summary

OptimisationStatusImpact
@compile {:inline, ...} for hot-path wrappersImplementedClosed S9→S10 gap (~1.40x → 1.0x)
Per-tag module-atom sigsImplementedReduced S10 from 4.2–5.5x to 2.6–2.7x
Total+linear inline continuationImplemented1.5–1.8× over nested-bind; parity with raw CPS
Reduce Env struct sizeUnder investigationPotential marginal gain

< Quick Reference | Up: Introduction | Index | State, Reader & Writer >