LetItCrash (let_it_crash v0.6.0)
View SourceA testing library for crash recovery and OTP supervision behavior.
LetItCrash helps you test that your GenServers and supervised processes
recover correctly after crashes, embracing Elixir's "let it crash" philosophy.
Usage
use LetItCrash
test "genserver recovers after crash" do
{:ok, pid} = MyGenServer.start_link([])
LetItCrash.crash(pid)
assert LetItCrash.recovered?(MyGenServer)
endCrash Functions
The library provides a crash/2 function that allows you to specify the type of exit signal.
It follows the same convention as Process.exit/2 for consistency and piping support:
crash(pid)- Sends a:shutdownexit signal (default, can be trapped)crash(pid, :shutdown)- Explicitly sends:shutdownsignalcrash(pid, :kill)- Sends a:killexit signal (cannot be trapped, guarantees termination)
Use :kill when testing processes that trap exits, such as GenServers that
need to perform cleanup operations.
Async work
Supervisor recovery is only part of the story. Production failures often
hide in fire-and-forget Tasks, Oban jobs, and LiveView handle_async/3
callbacks — work that runs outside the caller's stack frame and can fail
silently. For that, see LetItCrash.Async, which provides:
LetItCrash.Async.observe_async/1,2— wrap a block of test code and collect telemetry about every async exception inside itLetItCrash.Async.assert_no_silent_swallow/1,2— fail when a Task raised but nobody noticedLetItCrash.Async.assert_all_completed/2— fail when async work didn't finish within a wall-clock budgetLetItCrash.Async.assert_idempotent/2— fail when running the same operation twice produces different observable state
Both surfaces are imported by use LetItCrash.
Summary
Functions
Imports LetItCrash testing functions into the current module.
Asserts that a process properly cleans up its Registry entries on crash and recovery.
Asserts the impact of crashing a child process on its siblings within a supervision tree.
Crashes a process by sending it an exit signal.
Checks if a registered process has recovered (restarted) after a crash.
Tests that a process can recover from a crash by executing a test function before and after the crash.
Verifies that ETS table entries are properly cleaned up when a process crashes.
Waits for a process to exit (terminate).
Waits for a registered process to exist and be alive.
Functions
Imports LetItCrash testing functions into the current module.
This imports both the top-level helpers (crash/2, recovered?/2,3,
wait_for_process/2, wait_for_exit/2, test_restart/3,
assert_clean_registry/3, verify_ets_cleanup/3,
assert_supervision_impact/3) and the async helpers from
LetItCrash.Async (observe_async/1,2, assert_no_silent_swallow/1,2,
assert_all_completed/2, assert_idempotent/2).
Asserts that a process properly cleans up its Registry entries on crash and recovery.
This function verifies that:
- The old Registry entry is removed when the process crashes
- A new Registry entry is created when the process recovers
- The new entry points to the new PID
Parameters
registry- The Registry module to monitorprocess_name- The registered name/key of the processopts- Options for the verification:timeout- Maximum time to wait for cleanup and re-registration (default: 2000ms)
Examples
test "process cleans up registry on restart" do
{:ok, _pid} = MyServer.start_link(name: :my_server)
Registry.register(MyApp.Registry, :my_server, %{status: :active})
LetItCrash.crash(:my_server)
LetItCrash.assert_clean_registry(MyApp.Registry, :my_server)
end
Asserts the impact of crashing a child process on its siblings within a supervision tree.
Crashes the target process and verifies that each process listed in :expect
reaches the expected status (:restarted, :alive, or :stopped).
This function validates that your chosen supervision strategy (:one_for_one,
:one_for_all, :rest_for_one) behaves as expected for your specific tree.
Parameters
supervisor- PID or registered name of the supervisortarget- Registered name of the child to crashopts- Options::expect(required) - Keyword list of{child_name, expected_status}:signal- Exit signal::shutdown(default) or:kill:timeout- Max wait time in ms (default: 2000):interval- Polling interval in ms (default: 50)
Expected Statuses
:restarted- Process is alive with a different PID than before the crash:alive- Process is alive with the same PID (unaffected):stopped- Process is no longer registered or alive
Each status can also be paired with an assertion function that verifies application-level behavior after the status is confirmed:
{:restarted, fn -> ... end}- Verify behavior after restart{:alive, fn -> ... end}- Verify the unaffected process is still functional{:stopped, fn -> ... end}- Run assertions after confirming the process stopped
The assertion function runs only after the status is confirmed. If the status does not match, the function is never called. If the status matches but the function raises, the error propagates as a test failure.
Examples
# Verify one_for_all restarts all children
LetItCrash.assert_supervision_impact(:my_sup, :worker_a,
expect: [
worker_a: :restarted,
worker_b: :restarted,
worker_c: :restarted
]
)
# Verify one_for_one only restarts the crashed child
LetItCrash.assert_supervision_impact(:my_sup, :worker_a,
expect: [
worker_a: :restarted,
worker_b: :alive,
worker_c: :alive
]
)
# Verify application behavior, not just OTP mechanics
LetItCrash.assert_supervision_impact(:my_sup, :coordinator,
expect: [
coordinator: {:restarted, fn ->
assert MyCoordinator.get_state() == :idle
end},
worker_a: {:alive, fn ->
assert MyWorker.get_status(:worker_a) == :waiting
end},
worker_b: :restarted
]
)
Crashes a process by sending it an exit signal.
Follows the same convention as Process.exit/2, with the process as the first argument
to enable easy piping.
Parameters
process- A PID or registered process name to crashtype- The type of exit signal::shutdown(default) or:kill
The :shutdown signal can be trapped by processes with Process.flag(:trap_exit, true),
while :kill cannot be trapped and guarantees termination.
Examples
# Default :shutdown signal:
{:ok, pid} = MyGenServer.start_link([])
LetItCrash.crash(pid)
# Explicitly specifying :shutdown:
LetItCrash.crash(pid, :shutdown)
# Piping support:
Process.whereis(:my_process)
|> LetItCrash.crash(:kill)
# For processes with trap_exit, use :kill:
defmodule ScoreCoordinator do
use GenServer
def init(_) do
Process.flag(:trap_exit, true)
{:ok, %{}}
end
end
{:ok, pid} = ScoreCoordinator.start_link([])
LetItCrash.crash(pid, :kill) # Guarantees termination
Checks if a registered process has recovered (restarted) after a crash.
This function works by comparing the current PID of a registered process with a previously stored PID. If they differ, it means the process was restarted.
Parameters
process_name- The registered name of the process to checkoriginal_pid- The PID before the crash (optional, will be retrieved if not provided)opts- Options for recovery checking:timeout- Maximum time to wait for recovery (default: 1000ms):interval- Polling interval (default: 50ms)
Examples
test "process recovers after crash" do
original_pid = Process.whereis(MyGenServer)
LetItCrash.crash(MyGenServer)
assert LetItCrash.recovered?(MyGenServer, original_pid)
end
Tests that a process can recover from a crash by executing a test function before and after the crash.
Parameters
process- PID or registered name of the process to testtest_fn- Function to execute before and after crashopts- Options for the test:timeout- Maximum time to wait for recovery (default: 1000ms)
Examples
test "maintains state after restart" do
LetItCrash.test_restart(MyStatefulServer, fn ->
assert MyStatefulServer.get_count() == 0
MyStatefulServer.increment()
assert MyStatefulServer.get_count() == 1
end)
end
Verifies that ETS table entries are properly cleaned up when a process crashes.
This function monitors specific ETS table entries and ensures they are cleaned up appropriately during process restart.
Parameters
table- The ETS table name or reference to monitorkey- The key to monitor in the ETS tableopts- Options for the verification:timeout- Maximum time to wait for cleanup (default: 1000ms):expect_cleanup- Whether to expect the entry to be cleaned up (default: true):expect_recreate- Whether to expect the entry to be recreated (default: false)
Examples
test "cleans up ETS entries on crash" do
:ets.insert(:my_cache, {:server_data, "important"})
LetItCrash.crash(:my_server)
LetItCrash.verify_ets_cleanup(:my_cache, :server_data)
end
test "recreates ETS entries after recovery" do
LetItCrash.crash(:my_server)
LetItCrash.verify_ets_cleanup(:my_cache, :server_data,
expect_cleanup: true, expect_recreate: true)
end
Waits for a process to exit (terminate).
Uses Process.monitor/1 to block until the given PID is no longer alive,
or until the timeout elapses. This is the deterministic replacement for
Process.sleep/1 calls that wait "long enough" for a process to die.
If the process is already dead when the call is made, the monitor fires
immediately with :noproc and the function returns :ok.
Parameters
pid- The PID of the process to wait onopts- Options for waiting:timeout- Maximum time to wait (default: 1000ms)
Returns
:ok- The process is no longer alive{:error, :timeout}- Process is still alive after the timeout
Examples
test "process dies after crash" do
{:ok, pid} = Agent.start_link(fn -> 0 end)
Process.unlink(pid)
LetItCrash.crash(pid)
:ok = LetItCrash.wait_for_exit(pid)
refute Process.alive?(pid)
end
# With custom timeout for slow-terminating processes
:ok = LetItCrash.wait_for_exit(pid, timeout: 5000)
Waits for a registered process to exist and be alive.
This function is useful in test setup when you need to ensure a process is available before interacting with it, particularly after starting supervisors or during async initialization.
Parameters
process_name- The registered name of the process to wait foropts- Options for waiting:timeout- Maximum time to wait (default: 1000ms):interval- Polling interval (default: 50ms)
Returns
:ok- Process exists and is alive{:error, :timeout}- Process did not appear within timeout
Examples
test "worker is available after supervisor starts" do
{:ok, _sup} = MySupervisor.start_link()
# Wait for the worker to be ready
:ok = LetItCrash.wait_for_process(:my_worker)
# Now safe to interact with it
assert MyWorker.get_status() == :ready
end
# With custom timeout for slow-starting processes
:ok = LetItCrash.wait_for_process(:heavy_worker, timeout: 5000)