This guide explains how to implement a custom FLAME.Backend for alternative compute providers.
FLAMEDockerBackend is itself a worked example: it provisions runners as Docker
containers via the Docker Engine API. Read its source alongside this guide.
Architecture Overview
FLAME separates concerns between three main components:
┌─────────────────────────────────────────────────────────────────────────────┐
│ PARENT NODE │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────────┐ │
│ │ FLAME.Pool │─────▶│FLAME.Runner │─────▶│ YourBackend (callbacks) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────────┘ │
│ │ │ │ │
│ │ │ │ │
└─────────│────────────────────│──────────────────────────│───────────────────┘
│ │ │
│ │ ┌─────────────────────┘
│ │ │ 1. init/1 - prepare state
│ │ │ 2. remote_boot/1 - provision runner
│ │ │ 3. remote_spawn_monitor/2 - spawn work
│ │ │ 4. system_shutdown/0 - terminate runner
│ │ │ 5. handle_info/2 - optional messages
│ │ │
│ ▼ ▼
┌─────────│────────────────────────────────────────────────────────────────────┐
│ │ RUNNER NODE (provisioned by backend) │
│ │ │
│ │ ┌──────────────────┐ ┌─────────────────────┐ │
│ └────────▶│ FLAME.Terminator │────────▶│ Your Application │ │
│ └──────────────────┘ └─────────────────────┘ │
│ │ │
│ │ connects back via {ref, {:remote_up, pid}} │
│ │ shuts down via system_shutdown/0 │
└──────────────────────────│───────────────────────────────────────────────────┘
│
▼
reads FLAME_PARENT
env var on bootLifecycle Sequence
PARENT RUNNER
│ │
Pool starts │ │
Runner │ │
─────────▶│ │
│ │
┌─────┴─────┐ │
│ init/1 │ prepare state, encode FLAME_PARENT│
└─────┬─────┘ │
│ │
┌─────┴──────────┐ │
│ remote_boot/1 │ provision compute ──────────▶│ boot app
│ │ │
│ │ │ FLAME.Terminator
│ │ │ reads FLAME_PARENT
│ │◀──{ref, {:remote_up, pid}}───│ connects back
│ │ │
└─────┬──────────┘ │
│ │
FLAME.call │ │
─────────▶│ │
┌─────┴─────────────────┐ │
│ remote_spawn_monitor/2│──spawn + monitor─────▶│ execute func
└─────┬─────────────────┘ │
│◀────────────result──────────────────────│
│ │
idle timeout │ │
or shutdown │ │
─────────▶│ │
│───{ref, {:remote_shutdown, :idle}}──────│
│ ┌────┴────────────┐
│ │system_shutdown/0│
│ └────┬────────────┘
│ │ terminate
▼ ▼Callback Reference
init(opts) :: {:ok, state} | {:error, reason}
Called when: A new FLAME.Runner process starts.
Purpose: Initialize backend state for one runner instance.
What to do:
- Merge application config with pool options
- Validate required configuration (API tokens, regions, etc.)
- Generate a unique runner identifier
- Create a parent reference with
make_ref() - Encode parent info using
FLAME.Parent.new/5andFLAME.Parent.encode/1 - Store the encoded parent for later use in
remote_boot/1
Key fields for FLAME.Parent.new/5:
ref- the reference you created withmake_ref()pid- useself()(the Runner process)backend- your backend module (e.g.,__MODULE__)node_base- a unique node basename for the runnerhost_env- env var name on runner that contains its hostname (ornil)
Example from FlyBackend:
def init(opts) do
# Merge config sources
conf = Application.get_env(:flame, __MODULE__) || []
state = Map.merge(defaults, Map.new(Keyword.merge(conf, opts)))
# Validate required config
for key <- [:token, :image, :host, :app] do
unless Map.get(state, key), do: raise ArgumentError, "missing :#{key}"
end
# Generate unique runner name and parent reference
state = %{state | runner_node_base: "#{state.app}-flame-#{rand_id(20)}"}
parent_ref = make_ref()
# Encode parent info for the runner to read
encoded_parent =
parent_ref
|> FLAME.Parent.new(self(), __MODULE__, state.runner_node_base, "FLY_PRIVATE_IP")
|> FLAME.Parent.encode()
# Store encoded parent in environment that will be passed to runner
new_env = Map.put(state.env, "FLAME_PARENT", encoded_parent)
{:ok, %{state | env: new_env, parent_ref: parent_ref}}
endExample from LocalBackend:
def init(opts) do
defaults = Application.get_env(:flame, __MODULE__) || []
_terminator_sup = Keyword.fetch!(opts, :terminator_sup)
{:ok, defaults |> Keyword.merge(opts) |> Enum.into(%{})}
endremote_boot(state) :: {:ok, terminator_pid, new_state} | {:error, reason}
Called when: The pool needs a new runner and calls Runner.remote_boot/3.
Purpose: Provision compute, start your app on it, and wait for connection.
What to do:
- Provision the compute resource (API call, container start, etc.)
- Ensure the
FLAME_PARENTenvironment variable is set on the runner - Wait for the terminator to connect with
{ref, {:remote_up, terminator_pid}} - Return the terminator pid and updated state
Critical: The runner application must start FLAME.Terminator in its supervision tree.
The terminator reads FLAME_PARENT from the environment and connects back automatically.
Example from FlyBackend:
def remote_boot(%FlyBackend{parent_ref: parent_ref} = state) do
# 1. Provision the machine via API
resp = create_machine_api_call(state)
case resp do
%{"id" => id, "private_ip" => ip} ->
new_state = %{state | runner_id: id, runner_private_ip: ip}
# 2. Wait for terminator to connect back
remote_terminator_pid =
receive do
{^parent_ref, {:remote_up, remote_terminator_pid}} ->
remote_terminator_pid
after
state.boot_timeout ->
Logger.error("failed to connect within timeout")
exit(:timeout)
end
# 3. Return success with terminator pid
new_state = %{new_state |
remote_terminator_pid: remote_terminator_pid,
runner_node_name: node(remote_terminator_pid)
}
{:ok, remote_terminator_pid, new_state}
error ->
{:error, error}
end
endExample from LocalBackend:
def remote_boot(state) do
# LocalBackend starts terminator directly in the same VM
parent = FLAME.Parent.new(make_ref(), self(), __MODULE__, "nonode", nil)
name = Module.concat(state.terminator_sup, to_string(System.unique_integer([:positive])))
opts = [name: name, parent: parent, log: state.log]
spec = Supervisor.child_spec({FLAME.Terminator, opts}, restart: :temporary)
{:ok, _sup_pid} = DynamicSupervisor.start_child(state.terminator_sup, spec)
case Process.whereis(name) do
terminator_pid when is_pid(terminator_pid) -> {:ok, terminator_pid, state}
end
endremote_spawn_monitor(state, func) :: {:ok, {pid, ref}} | {:error, reason}
Called when: FLAME.call/3, FLAME.cast/3, or internal runner operations execute work.
Purpose: Spawn a function on the runner and monitor it.
What to do:
- Accept either a zero-arity function or
{module, function, args}tuple - Spawn the function in the runner's environment
- Monitor the spawned process
- Return
{:ok, {pid, monitor_ref}}
For distributed backends, use Node.spawn_monitor/2 or Node.spawn_monitor/4.
For local backends, use spawn_monitor/1 or spawn_monitor/3.
Example from FlyBackend:
def remote_spawn_monitor(%FlyBackend{} = state, term) do
case term do
func when is_function(func, 0) ->
{pid, ref} = Node.spawn_monitor(state.runner_node_name, func)
{:ok, {pid, ref}}
{mod, fun, args} when is_atom(mod) and is_atom(fun) and is_list(args) ->
{pid, ref} = Node.spawn_monitor(state.runner_node_name, mod, fun, args)
{:ok, {pid, ref}}
other ->
raise ArgumentError, "expected function or MFA tuple, got: #{inspect(other)}"
end
endExample from LocalBackend:
def remote_spawn_monitor(_state, term) do
case term do
func when is_function(func, 0) ->
{pid, ref} = spawn_monitor(func)
{:ok, {pid, ref}}
{mod, fun, args} when is_atom(mod) and is_atom(fun) and is_list(args) ->
{pid, ref} = spawn_monitor(mod, fun, args)
{:ok, {pid, ref}}
end
endsystem_shutdown() :: term()
Called when: The terminator needs to stop the runner system.
Called from: Inside the runner node by FLAME.Terminator when:
- Parent requests shutdown
- Parent node goes down
- Runner idles out (based on
idle_shutdown_after) - Failsafe timeout expires
Purpose: Terminate the runner's host system.
What to do:
- For real compute backends: call
System.stop()to terminate the VM - For local/test backends: return
:noop(do nothing)
Example from FlyBackend:
def system_shutdown do
System.stop()
endExample from LocalBackend:
def system_shutdown, do: :noophandle_info(msg, state) :: {:noreply, new_state} (optional)
Called when: The Runner GenServer receives messages not handled internally.
Purpose: Handle backend-specific messages (provider webhooks, monitors, etc.).
What to do:
- Process any backend-specific messages
- Return
{:noreply, new_state}with updated state
This callback is optional. If not implemented, unhandled messages are ignored.
Messages from Terminator
The FLAME.Terminator on the runner sends these messages to the parent:
{ref, {:remote_up, terminator_pid}}
Sent when the terminator successfully connects to the parent node.
Your remote_boot/1 should wait for this message.
{ref, {:remote_shutdown, :idle}}
Sent when the runner is shutting down due to idle timeout. The Runner handles this internally; no backend action needed.
Checklist for New Backends
Module setup:
- [ ] Add
@behaviour FLAME.Backend - [ ] Implement all required callbacks
- [ ] Add
init/1:
- [ ] Merge application config with opts
- [ ] Validate required configuration
- [ ] Generate unique runner identifier
- [ ] Create and encode
FLAME.Parentstruct - [ ] Store
FLAME_PARENTfor runner environment
remote_boot/1:
- [ ] Provision compute resource
- [ ] Pass
FLAME_PARENTenv var to runner - [ ] Wait for
{ref, {:remote_up, pid}}message - [ ] Handle boot timeout
- [ ] Return
{:ok, terminator_pid, new_state}
remote_spawn_monitor/2:
- [ ] Handle zero-arity functions
- [ ] Handle
{mod, fun, args}tuples - [ ] Spawn on runner (remote or local)
- [ ] Return
{:ok, {pid, monitor_ref}}
system_shutdown/0:
- [ ] Terminate the runner system (or no-op for local)
Runner application:
- [ ] Ensure
FLAME.Terminatorstarts in supervision tree - [ ] Terminator reads
FLAME_PARENTautomatically viaFLAME.Parent.get/0
- [ ] Ensure
Reference Implementations
FLAMEDockerBackend- Provisions runners as Docker containers via the Docker Engine APIFLAME.FlyBackend- Full distributed backend for Fly.io machinesFLAME.LocalBackend- Simple local backend for development/testing