Backpressure Deep-Dive

Copy Markdown View Source

The Problem

Erlang's built-in Port mechanism copies all data from the child's stdout into the BEAM's mailbox immediately. If a child produces data faster than Elixir code consumes it, the mailbox grows unbounded → OOM.

NetRunner's Solution

NetRunner uses NIF-based I/O with enif_select to implement demand-driven backpressure. Data stays in the OS pipe buffer until explicitly read.

sequenceDiagram
    participant E as Elixir Consumer
    participant GS as GenServer
    participant NIF as NIF (dirty IO)
    participant Pipe as OS Pipe Buffer
    participant Child as Child Process

    E->>GS: Process.read(p)
    GS->>NIF: nif_read(fd, 65535)
    alt Data available
        NIF->>Pipe: read(fd, buf, 65535)
        Pipe-->>NIF: bytes
        NIF-->>GS: {:ok, binary}
        GS-->>E: {:ok, binary}
    else Pipe empty (EAGAIN)
        NIF->>NIF: enif_select(fd, READ)
        NIF-->>GS: {:error, :eagain}
        GS->>GS: Park caller in operations queue
        Note over Pipe,Child: Child writes, pipe fills
        Pipe-->>GS: {:select, fd, ref, :ready_input}
        GS->>NIF: nif_read(fd, 65535) [retry]
        NIF-->>GS: {:ok, binary}
        GS-->>E: {:ok, binary}
    end

How It Works

Read Path

  1. NetRunner.Process.read/2 calls GenServer.call(pid, {:read, :stdout, max_bytes}, :infinity)
  2. GenServer tries Pipe.read(pipe, max_bytes) → calls Nif.nif_read(resource, max_bytes)
  3. NIF runs on dirty IO scheduler:
    • Calls read(fd, buf, max_bytes)
    • If data available: returns {:ok, binary} immediately
    • If EAGAIN: calls enif_select(fd, ERL_NIF_SELECT_READ), returns {:error, :eagain}
  4. On EAGAIN, GenServer parks the caller in the operations queue
  5. When data arrives, BEAM's event loop detects fd readiness via epoll/kqueue
  6. BEAM sends {:select, resource, ref, :ready_input} to GenServer
  7. GenServer retries all parked read operations

Write Path

  1. NetRunner.Process.write/2 calls GenServer.call(pid, {:write, data}, :infinity)
  2. GenServer enters write_loop:
    • Pipe.write(pipe, data)Nif.nif_write(resource, data)
    • If fully written: returns :ok
    • If partial write: retries immediately with remaining data
    • If EAGAIN: parks caller, waits for {:select, ..., :ready_output}
  3. Partial writes are retried immediately because the kernel may have room for more

Why Partial Write Retry Matters

Without immediate retry, a partial write would park the caller, but enif_select might not fire again because the pipe buffer isn't actually full — the NIF just happened to write less than requested. The write loop ensures we keep writing until we either:

  • Complete the write (all bytes sent)
  • Get EAGAIN (pipe buffer truly full → enif_select registered → will get notified)

Pipe Buffer Sizes

The OS pipe buffer acts as the natural flow control mechanism:

PlatformDefault Pipe BufferEffect
Linux64 KB (configurable up to 1 MB via fcntl(F_SETPIPE_SZ))Child blocks on write() when buffer full
macOS64 KBSame blocking behavior

When the Elixir consumer stops reading:

  1. OS pipe buffer fills up
  2. Child's write() call blocks (kernel-level backpressure)
  3. Child naturally slows down or stops producing
  4. No memory growth on the BEAM side

Comparison with Alternatives

ApproachBackpressureMemory Safety
System.cmd / PortsNone — mailbox floodingOOM on fast producers
ExileYes — NIF + enif_selectSafe
MuonTrapNone — Port-basedOOM on fast producers
erlexecLimited — single port bottleneckBottleneck limits throughput
NetRunnerYes — NIF + enif_selectSafe