How to apply Lockstep to a real BEAM library to find (or rule out) concurrency bugs. This is the field guide we wrote while testing Cachex, ConCache, lud/mutex, nimble_pool, and Hammer.
The bug-finding rate so far: 1 in 5 libraries. Hammer's
Hammer.Atomic.FixWindow.inc/4 had a real lost-update race against
hit/5. The other four are concurrency-correct under their public
APIs — which is itself useful information.
This doc is the recipe for getting to that signal.
Contents
- When to reach for Lockstep
- Two paths: compile-rewrite vs test-only dep
- The model-based linearizability template
- Writing the model
- Picking a strategy
- Iteration counts and tuning
- Beyond linearizability: fault injection, time-skew
- What Lockstep can and cannot reach
- Anti-patterns
- Case studies
When to reach for Lockstep
Lockstep makes sense when you're testing code where:
- Multiple processes interact through shared mutable state (ETS, atomics, persistent_term, registered names).
- You need a reproducible failing schedule — vanilla concurrent ExUnit tests find races but throw away the trace.
- You suspect a race exists at non-trivial depth (3+ scheduling decisions away from a clear "first send wins").
It's overkill when:
- The code is sequential or has only one writer.
- You just want stress testing —
Task.async_streamin a loop is cheaper. - The race is already easy to catch (tight
:erlang.spawnloops catch lost-updates 99% of the time, see the README's "Why bother").
Two paths: compile-rewrite vs test-only dep
There are two ways to point Lockstep at a library, with very different tradeoffs.
Path A: test-only Hex dep (workers controlled, library raw)
Add the library to mix.exs as only: :test. Workers run as
Lockstep.Task.async/1 (controlled), call into the library, and
record into Lockstep.History. The library itself runs un-rewritten
on the real BEAM scheduler.
You get:
- Real-world concurrent behavior of the library (real scheduler).
- Lockstep manages worker scheduling and history recording.
- Linearizability checking against your model.
You don't get:
- PCT scheduling into the library's GenServer mailbox dispatch. Internal races aren't actively explored — you're hoping the BEAM scheduler hits them naturally.
Use this when the library is too heavy to compile-rewrite (heavy macros, many transitive deps, kernel-level features). Cachex and Phoenix.PubSub fit here.
Path B: compile-rewrite via Lockstep.MixCompiler (full pipeline)
Clone the source, run it through Lockstep.MixCompiler.compile/1,
and Code.compile_file/1 the rewritten output. The library's
internal :ets, :atomics, Process.send_after, GenServer.call,
Process.monitor, and friends all become Lockstep sync points.
You get:
- PCT (or any other strategy) scheduling into the library's internals — handle_call dispatch ordering, monitor cleanup ordering, message arrival sequencing.
- A genuinely-stronger correctness signal: bugs at depth 3-6 inside the SUT are findable.
You pay for it with:
- Source must be cloneable and compilable. Heavy macro libraries
(Cachex's record-style state, Hammer's
__before_compile__) may need partial workarounds. - Some OTP idioms aren't reachable:
{:via, Registry, ...}was blocked until we added rewriting for the via tuple itself;:"$ancestors"propagation is also needed for libs that readProcess.get(:"$ancestors").
Use this when the library is small-medium (≤ 2k LOC, few transitive deps) and you actually want to find concurrency bugs in its logic.
The model-based linearizability template
The reusable shape for any read/write API:
defmodule Lockstep.MyLibTest do
use ExUnit.Case, async: false
setup_all do
# Compile through Lockstep.MixCompiler if you want path B.
# Otherwise add the lib as a test dep and skip this.
...
end
defmodule MyModel do
@behaviour Lockstep.Model
alias Lockstep.History.Event
@impl true
def init, do: %{} # whatever shape your reference state is
@impl true
def step(state, %Event{f: :get, value: k}, %Event{type: :ok, value: v}) do
if Map.get(state, k) == v do
{:ok, state}
else
{:error, "stale read"}
end
end
def step(state, %Event{f: :put, value: {k, v}}, %Event{type: :ok}) do
{:ok, Map.put(state, k, v)}
end
# ... one clause per (op kind, completion type)
# Generic completion catch-alls (always include these):
def step(state, _, %Event{type: :fail}), do: {:ok, state}
def step(state, _, %Event{type: :info}), do: {:ok, state}
end
test "linearizability" do
Lockstep.Runner.run(
&body/0,
iterations: 200,
strategy: :pct,
max_steps: 5_000,
seed: 1,
iter_timeout: 30_000,
progress: 25,
suite: "mylib_lin"
)
end
defp body do
# 1. Set up a fresh SUT instance (unique name per iteration).
{:ok, sut} = MyLib.start_link(...)
# 2. Start a History recorder.
history = Lockstep.History.start_link!()
# 3. Spawn workers, each pulling from a Generator.
n_workers = 4
ops_per_worker = 6
workers =
for w <- 1..n_workers do
Lockstep.Task.async(fn ->
gen =
Lockstep.Generator.weighted(
[
{{:get, "k"}, 4},
{{:put, "k", w * 100}, 2},
# ... your op pool ...
],
seed: w * 1_000_003
)
|> Lockstep.Generator.take(ops_per_worker)
Lockstep.Generator.each(gen, fn op -> apply_op(history, sut, op) end)
end)
end
Lockstep.Task.await_many(workers, 30_000)
# 4. Check the history against the model.
events = Lockstep.History.events(history)
case Lockstep.Checker.Linearizable.check(events, MyModel) do
{:ok, _} -> :ok
{:error, info} ->
msg = Lockstep.Checker.Linearizable.format_violation(info)
raise "non-linearizable history found:\n\n#{msg}"
end
end
defp apply_op(history, sut, {:get, k}) do
Lockstep.History.op(history, :get, k, fn -> MyLib.get(sut, k) end)
end
defp apply_op(history, sut, {:put, k, v}) do
Lockstep.History.op(history, :put, {k, v}, fn -> MyLib.put(sut, k, v) end)
end
endLockstep.History.op/4 records an :invoke before the function, an
:ok (with the function's return value) on normal return, or an
:info (with the exception message) on raise. The
Lockstep.Checker.Linearizable then tries to find a serial order
that's consistent with both real-time precedence and the model.
Writing the model
A model is a sequential reference implementation. Its job is to answer: "given the recorded operations applied in some serial order, is this completion consistent?"
One clause per (op kind, completion shape) pair
def step(state, %Event{f: :put, value: {k, v}}, %Event{type: :ok, value: :ok}) do
{:ok, Map.put(state, k, v)}
end
def step(state, %Event{f: :insert_new, value: {k, v}}, %Event{type: :ok, value: :ok}) do
if Map.has_key?(state, k) do
{:error, "insert_new returned :ok but key already exists"}
else
{:ok, Map.put(state, k, v)}
end
end
def step(state, %Event{f: :insert_new, value: {k, _v}}, %Event{type: :ok, value: {:error, :already_exists}}) do
if Map.has_key?(state, k) do
{:ok, state}
else
{:error, "insert_new returned :already_exists but key didn't exist"}
end
endEach clause encodes both branches: "what state updates if this completion is consistent" and "what's the proof it's not consistent." The Linearizability checker uses this to backtrack through serial orderings.
Always include the catch-alls
def step(state, _invoke, %Event{type: :fail}), do: {:ok, state}
def step(state, _invoke, %Event{type: :info}), do: {:ok, state}:fail is an explicit "didn't apply" — the op had no effect on
state. :info is an unknown outcome (the worker crashed
mid-call) — for now, treat as no-apply, which is the conservative
default. (A more sophisticated checker would branch and try both.)
Be careful with return values
Probe the library's actual return shapes before writing the model. We had several "compile fails" or "function clause" errors traced back to incorrect assumptions:
Cachex.get/2returns{:ok, value}, notvalue | nil.Cachex.get_and_update/3returns{:commit, new_v}directly, not{:ok, {:commit, new_v}}.Mutex.lock/2returns{:ok, %Mutex.Lock{}}or{:error, :busy}.Mutex.await/3returns%Mutex.Lock{}directly (or raises).
Run a quick mix run --eval 'IO.inspect(MyLib.foo(...))' first.
State scope: single-key first, multi-key second
Start with all ops on the same key. The contention is highest, the
model is simplest, and most race patterns surface there. Once that
passes, expand to multi-key — the model is just %{key => value}
instead of value.
Picking a strategy
Lockstep ships five strategies. Rule of thumb:
| Strategy | Best at | When to skip |
|---|---|---|
:pct (default) | Classic data races (lost-update, TOCTOU) at depth ≤ 3. | When you need long runs of one process picked consecutively (POS is better). |
:pos | Bugs that need the strategy to consistently pick the same proc over many sync points (e.g., async replication patterns). | When PCT depth-bounded probability is what you need. |
:fair_pct | Long-running tests where infrastructure (heartbeats, timers) would otherwise dominate. | When you need full PCT for the whole run. |
:idpct | Unknown bug depth + many iterations (≥ 1000). Cycles depth in [d_min, d_max] per iteration. | When iteration count is small and you want predictable depth. |
:random | Sanity check or cheap stress. | When you want adversarial scheduling. |
Workflow: start with :pct (default depth 3). If 200-500
iterations don't surface the bug, try :pos. If neither, switch to
:idpct and run thousands of iterations. If still nothing, the
suspected bug probably doesn't exist.
The Hammer bug was found by :pct at depth 3 with 5000 iterations
on a 4-worker 1 inc + 3 hit workload. ConCache's lock manager and
nimble_pool's checkout queue have been tested to ≥100 iterations of
mixed workloads under :pct with no findings.
Iteration counts and tuning
PCT's bug-finding probability per iteration is 1 / (n · k^(d-1))
where n is step count, k is process count, d is bug depth.
A rough sizing table for a typical 5-worker test:
| Bug depth | Per-iter probability | Iterations for 99% confidence |
|---|---|---|
| 2 | ~1% | ~500 |
| 3 | ~0.05% | ~10,000 |
| 4 | ~0.0025% | ~200,000 |
In practice:
- Start at 100 iterations during development. If your test is flaky, fix the test before scaling iterations.
- Push to 1,000 iterations for a real correctness check.
- For confidence at depth 3+, 5,000-10,000 iterations. Each takes ~1ms wall-time for typical workloads, so 10k is ~10s.
- If you've gone past 50,000 and still nothing, the bug almost certainly isn't in your concurrency surface. Move on.
Beyond linearizability: fault injection, time-skew
The Linearizability template covers ~80% of useful tests. The remaining 20% need targeted directed tests.
Fault injection (kill a process holding a resource)
defp fault_body do
{:ok, mutex} = Mutex.start_link([])
parent = self()
ref = make_ref()
# Worker A: acquires the lock, signals, then exits without releasing.
Lockstep.spawn(fn ->
{:ok, _lock} = Mutex.lock(mutex, "k")
Lockstep.send(parent, {ref, :a_acquired})
# No release. fn returns → A exits :normal → :DOWN fires.
end)
Lockstep.recv_first(fn {^ref, :a_acquired} -> true; _ -> false end)
# B (us) awaits — should block until A's :DOWN, then acquire.
lock = Mutex.await(mutex, "k", :infinity)
:ok = Mutex.release(mutex, lock)
endLockstep's controller delivers :DOWN automatically when a managed
process exits, AND treats monitor-observed deaths as expected (no
:bug flag). PCT explores orderings of: A's exit, the :DOWN
delivery to B, B's pending await recv, B's retry.
Time-skew (force a timer to fire mid-operation)
Lockstep's virtual time advances when no process is runnable.
Lockstep.sleep/1 parks the calling process and lets virtual time
tick. Combined with aggressive timer config in the SUT (e.g.,
expiry: 0, cleanup_interval: 1), you can force timer events to
interleave with concurrent ops:
defp time_skew_body do
# Example: a circuit-breaker style state machine with periodic cleanup.
{:ok, _} = MyBreaker.start_link(cleanup_interval: 1)
{:ok, _} = MyBreaker.register("api", max_attempts: 3, expiry: 0)
# Auto-open with 3 failures.
MyBreaker.failure("api")
MyBreaker.failure("api")
MyBreaker.failure("api")
# Concurrent: cleanup tick fires + worker does another failure.
Lockstep.spawn(fn -> MyBreaker.failure("api") end)
Lockstep.sleep(10) # advances virtual time, fires cleanup tick
# Final state must be valid (state machine consistent under interleaving).
{:ok, info} = MyBreaker.status("api")
if info.state not in [:open, :half_open], do: raise(...)
endNote: System.system_time/1 is not rewritten by Lockstep. Real
wall-clock time still ticks. If your library's expiry comparison
uses System.system_time, you need very low expiry values for the
comparison to fire under fast iterations.
What Lockstep can and cannot reach
Reaches
- All
Lockstep.*wrapped primitives::ets.*,:atomics.*,:persistent_term.*,GenServer.{call, cast},Process.{monitor, demonitor, send_after, link, unlink, flag, alive?},spawn,spawn_link,Lockstep.send,Lockstep.recv*. - OTP
{:via, Registry, ...}registration (rewriter transforms the via tuple toLockstep.Registry). Process.get(:"$ancestors")/:"$ancestors"is propagated on every spawn.
Doesn't reach
- BEAM kernel messages:
:ETS-TRANSFER(heir mechanism),:nodedown, port messages. These bypass the controller's mailbox. Libraries that depend on these (e.g.,Eternal's ETS HEIR) won't work cleanly under Lockstep without additional wrapping. - Macro-generated code in user modules. The rewriter operates on
source files passed to
Lockstep.MixCompiler. If a library uses__before_compile__to inject functions into the user's module (Hammer does this), the user's module isn't rewritten by default. Workaround: bypass the macro and call the underlying algorithm modules directly (which are rewritten). System.system_time/1,:erlang.system_time/1,:os.timestamp/0. Real wall-clock is real wall-clock. If a library's correctness depends on time progression, it interacts oddly with Lockstep's fast iteration loop.- NIFs other than the wrapped set. If a lib calls into a custom NIF, Lockstep treats it as opaque.
Library-side compile quirks we've hit
- Compile-time
File.read!("README.md"): nimble_pool's@moduledocreads README at compile time. Rewritten file lives in a different cwd, soFile.read!fails. Pre-process the rewritten source to replace the line. - Macro-heavy state records: Cachex uses Erlang record syntax via macros. Compiling all of Cachex through Lockstep.MixCompiler is a rabbit hole; we use Path A (test-only Hex dep) instead.
- Compile-time
Code.ensure_loaded!: some libraries assert another module is loaded at compile time. The rewritten file is loaded fresh, so this can fail unless dependencies load first. Order rewritten files by leaf-to-root in yoursetup_all.
Anti-patterns
Strict equality assertions where the API returns floats
We've seen rate-limiter smoke tests assert {:allow, 3.0} = ... —
which fails ~25% of the time because (now - last) / interval * refill produces 3.0000001 if even a microsecond of real time
elapsed. Use assert_in_delta or is_number/1 for float-shaped
returns.
Single-iteration tests for known-deep bugs
Running with iterations: 1 means PCT picks one schedule and stops.
That's fine for a directed test of a specific known interleaving
(e.g., shrinking output) but not for hunting unknown races. Default
to iterations: 200 minimum.
Ignoring :info completions in the model
The catch-all def step(state, _, %Event{type: :info}), do: {:ok, state} treats unknown outcomes as no-applies. This is fine for
correctness (conservative), but if you want to detect bugs that
manifest only under crashed operations (e.g., the operation
applied but the caller never saw the response), you need to branch.
Sharing state across iterations
If you Cachex.start_link(:my_cache, ...) once in setup_all, all
iterations write to the same cache. State leaks. Use a per-iteration
unique name: name = String.to_atom("c_#{:erlang.unique_integer([:positive])}").
Case studies
Cachex (1.5M downloads): linearizability confirmed
Test: 200 iterations × 6 workers × 8 mixed ops + 500 iterations of
get_and_update R-M-W + 200 iterations multi-key. All linearizable.
Compiled un-rewritten (Path A) due to 5 transitive deps. Cachex's
basic ops are ETS-atomic; get_and_update serializes through a
per-cache locksmith Queue GenServer.
lud/mutex: linearizability + fault-injection both pass
Test: 200 iterations linearizability on 5 workers × 4 lock/release
cycles + 100 iterations fault injection (kill holder, verify monitor
cleanup). Full pipeline. The fault test exercises the
Process.monitor cleanup path — Lockstep delivers :DOWN
automatically when a managed process exits.
ConCache (~600k downloads): linearizability confirmed
Test: 100 iterations × 4 workers × 6 mixed ops on single key, with PCT scheduling INTO ConCache.Lock's GenServer (the lock manager + waiter queue + monitor-based cleanup). All linearizable. Required two Lockstep core fixes:
- AST rewriter handles
{:via, Registry, term}→{:via, Lockstep.Registry, term}(OTP's:viadispatch otherwise routes to the real Registry which isn't started under Lockstep). Lockstep.spawnpropagates:"$ancestors"(ConCache.Owner reads it viahd(Process.get(:"$ancestors"))to find its parent).
nimble_pool (used by Finch, Postgrex): clean
Test: 100 iterations × 5 client tasks × 3 checkout cycles on
pool_size 2. Full pipeline. All cycles complete. Tests the
checkout! queueing, Process.monitor on clients, and worker
selection logic.
Hammer (~rate limiter): bug found
Test: 4-worker 1 inc + 3 hit workload, 5000 iterations under PCT.
Found a real lost-update race in Hammer.Atomic.FixWindow.inc/4:
case :ets.lookup(table, full_key) do
[{_, atomic}] -> ...
[] ->
:ets.insert(table, {full_key, :atomics.new(2, signed: false)}) # OVERWRITES
inc(table, key, scale, increment)
end:ets.insert overwrites whereas hit/5 correctly uses
:ets.insert_new. Under interleaved hit+inc on a fresh key,
increments to the first atomic can be clobbered when the second
process's inc overwrites with a fresh atomic.
This was the seventh library tested with the template — the first to surface a real concurrency bug. The other six are correctly designed; Hammer's mismatch between hit/5 and inc/4 is an implementation gap, not a documented design choice.
Appendix: minimal test scaffolding for compile-rewrite (Path B)
defmodule Lockstep.MyLibTest do
use ExUnit.Case, async: false
@src "/tmp/mylib"
@rewritten_dir "/tmp/mylib_lockstep"
setup_all do
cond do
not File.exists?(@src) ->
IO.puts(:stderr, "\n[mylib] skipped: clone to #{@src}")
{:ok, available: false}
true ->
File.rm_rf!(@rewritten_dir)
source_paths = [
Path.join(@src, "lib/mylib.ex")
# ... add all .ex files in dependency order
]
case Lockstep.MixCompiler.compile(%{paths: source_paths, output: @rewritten_dir}) do
{:ok, rewritten} ->
# Sort leaf-to-root to satisfy module-level deps.
ordered = Enum.sort_by(rewritten, &order_key/1)
for path <- ordered do
try do
Code.compile_file(path)
IO.puts(:stderr, "[mylib] compiled #{Path.basename(path)}")
rescue
e -> IO.puts(:stderr, "compile FAILED: #{Exception.message(e)}")
end
end
{:ok, available: Code.ensure_loaded?(MyLib)}
_ ->
{:ok, available: false}
end
end
end
defp order_key(path) do
case Path.basename(path) do
"leaf_module.ex" -> 0
"mid_module.ex" -> 1
"main_module.ex" -> 2
_ -> 99
end
end
test "lin", ctx do
if not ctx.available, do: assert(true), else: lin_test()
end
defp lin_test do
Lockstep.Runner.run(&body/0,
iterations: 200,
strategy: :pct,
max_steps: 5_000,
seed: 1,
iter_timeout: 30_000,
progress: 25,
suite: "mylib_lin"
)
end
defp body, do: ...
end