Lockstep.NativeRunner (Lockstep v0.1.0)

Copy Markdown View Source

Experimental, observational only. Drive unmodified OTP code under a Lockstep-style runner using :erlang.trace/3 and :erlang.suspend_process/1. The intent was full PCT-style scheduling control over arbitrary OTP code; in practice :erlang.trace is observational, not interventional, and this module does not reliably force interesting interleavings.

What works

  • Running unmodified GenServer-based code to completion under tracing.
  • Recording sends, receives, spawns, and exits as Lockstep trace events, plus saving them to a .lockstep file.
  • Detecting straightforward correctness violations the BEAM scheduler happens to surface (e.g., a deterministic crash inside handle_call).

What does not work

  • Forcing different schedules to expose races. By the time our controller observes a :send event, the message is already in the receiver's mailbox; we cannot reorder it. Lockstep's PCT strategy has no real effect because we're watching, not scheduling.
  • Anything that requires intervening before an action -- synchronous "pick next" decisions, deferred message delivery, deterministic replay.

When to use which runner

  • Use Lockstep.Runner when you control the test body. It gives real PCT scheduling via Lockstep.send/recv/spawn and the OTP wrappers. This is where the bug-finding strength lives.
  • Use Lockstep.NativeRunner only as an observational harness when you specifically want a Lockstep-shaped trace recording of unmodified code. It is not a substitute for the controlled runner for finding races.

Real PCT-style control over unmodified code requires compile-time AST rewriting (the path PULSE and the 2024 Bueso de Barrio Scheduler library take). That is the planned next step; this module is retained so the empirical limitation of :erlang.trace is documented in code rather than only in prose.

Summary

Functions

Run test_fun under native trace-controlled scheduling.

Functions

run(test_fun, opts \\ [])

Run test_fun under native trace-controlled scheduling.

Same opts as Lockstep.Runner.run/2 plus:

  • :resume_quantum_us -- how long to let a resumed pid run before we re-suspend it. Smaller = finer control, slower. Default 500.