# Backpressure Deep-Dive

## The Problem

Erlang's built-in Port mechanism copies all data from the child's stdout into the BEAM's mailbox immediately. If a child produces data faster than Elixir code consumes it, the mailbox grows unbounded → OOM.

## NetRunner's Solution

NetRunner uses NIF-based I/O with `enif_select` to implement demand-driven backpressure. Data stays in the OS pipe buffer until explicitly read.

```mermaid
sequenceDiagram
    participant E as Elixir Consumer
    participant GS as GenServer
    participant NIF as NIF (dirty IO)
    participant Pipe as OS Pipe Buffer
    participant Child as Child Process

    E->>GS: Process.read(p)
    GS->>NIF: nif_read(fd, 65535)
    alt Data available
        NIF->>Pipe: read(fd, buf, 65535)
        Pipe-->>NIF: bytes
        NIF-->>GS: {:ok, binary}
        GS-->>E: {:ok, binary}
    else Pipe empty (EAGAIN)
        NIF->>NIF: enif_select(fd, READ)
        NIF-->>GS: {:error, :eagain}
        GS->>GS: Park caller in operations queue
        Note over Pipe,Child: Child writes, pipe fills
        Pipe-->>GS: {:select, fd, ref, :ready_input}
        GS->>NIF: nif_read(fd, 65535) [retry]
        NIF-->>GS: {:ok, binary}
        GS-->>E: {:ok, binary}
    end
```

## How It Works

### Read Path

1. `NetRunner.Process.read/2` calls `GenServer.call(pid, {:read, :stdout, max_bytes}, :infinity)`
2. GenServer tries `Pipe.read(pipe, max_bytes)` → calls `Nif.nif_read(resource, max_bytes)`
3. NIF runs on dirty IO scheduler:
   - Calls `read(fd, buf, max_bytes)`
   - If data available: returns `{:ok, binary}` immediately
   - If `EAGAIN`: calls `enif_select(fd, ERL_NIF_SELECT_READ)`, returns `{:error, :eagain}`
4. On `EAGAIN`, GenServer parks the caller in the operations queue
5. When data arrives, BEAM's event loop detects fd readiness via epoll/kqueue
6. BEAM sends `{:select, resource, ref, :ready_input}` to GenServer
7. GenServer retries all parked read operations

### Write Path

1. `NetRunner.Process.write/2` calls `GenServer.call(pid, {:write, data}, :infinity)`
2. GenServer enters `write_loop`:
   - `Pipe.write(pipe, data)` → `Nif.nif_write(resource, data)`
   - If fully written: returns `:ok`
   - If partial write: retries immediately with remaining data
   - If `EAGAIN`: parks caller, waits for `{:select, ..., :ready_output}`
3. Partial writes are retried immediately because the kernel may have room for more

### Why Partial Write Retry Matters

Without immediate retry, a partial write would park the caller, but `enif_select` might not fire again because the pipe buffer isn't actually full — the NIF just happened to write less than requested. The write loop ensures we keep writing until we either:
- Complete the write (all bytes sent)
- Get `EAGAIN` (pipe buffer truly full → `enif_select` registered → will get notified)

## Pipe Buffer Sizes

The OS pipe buffer acts as the natural flow control mechanism:

| Platform | Default Pipe Buffer | Effect |
|----------|-------------------|--------|
| Linux | 64 KB (configurable up to 1 MB via `fcntl(F_SETPIPE_SZ)`) | Child blocks on `write()` when buffer full |
| macOS | 64 KB | Same blocking behavior |

When the Elixir consumer stops reading:
1. OS pipe buffer fills up
2. Child's `write()` call blocks (kernel-level backpressure)
3. Child naturally slows down or stops producing
4. No memory growth on the BEAM side

## Comparison with Alternatives

| Approach | Backpressure | Memory Safety |
|----------|-------------|---------------|
| `System.cmd` / Ports | None — mailbox flooding | OOM on fast producers |
| `Exile` | Yes — NIF + enif_select | Safe |
| `MuonTrap` | None — Port-based | OOM on fast producers |
| `erlexec` | Limited — single port bottleneck | Bottleneck limits throughput |
| **NetRunner** | Yes — NIF + enif_select | Safe |
