Overview
NetRunner provides safe OS process execution for Elixir by combining NIF-based async I/O with a persistent shepherd binary. This guarantees zero zombie processes, even when the BEAM is killed with SIGKILL.
Component Diagram
graph TD
A[User Code] --> B[NetRunner API]
B --> C[NetRunner.Stream]
B --> D[NetRunner.Process GenServer]
D --> E[Exec: Port + UDS]
D --> F[NIF: enif_select I/O]
D --> G[Watcher: Zombie Prevention]
E --> H[Shepherd Binary]
H --> I[Child Process]
F --> J[Pipe FDs via SCM_RIGHTS]
J --> IProcess Spawn Sequence
sequenceDiagram
participant B as BEAM
participant S as Shepherd
participant C as Child
B->>B: Create UDS listener
B->>S: Port.open(shepherd)
S->>B: Connect to UDS
S->>S: fork()
S->>C: execvp(command)
S->>B: sendmsg(SCM_RIGHTS: stdin_w, stdout_r, stderr_r)
S->>B: MSG_CHILD_STARTED(pid)
B->>B: NIF: create_fd(stdin), create_fd(stdout)
loop I/O
B->>B: NIF read/write on FDs (enif_select)
end
C->>S: exit(status)
S->>B: MSG_CHILD_EXITED(status)
S->>S: exit(0)Zombie Prevention (3 Layers)
graph TD
subgraph "Zombie Prevention"
L1[Layer 1: Shepherd<br/>Detects BEAM death via POLLHUP<br/>SIGTERM → SIGKILL child]
L2[Layer 2: Watcher GenServer<br/>Detects Process GenServer death<br/>SIGTERM → SIGKILL via NIF]
L3[Layer 3: NIF Resource Destructor<br/>Closes FDs on GC<br/>Child sees broken pipe]
end
L1 -->|Covers| BEAM_CRASH[BEAM SIGKILL/crash]
L2 -->|Covers| GS_CRASH[GenServer crash]
L3 -->|Covers| LEAK[Resource leak/GC]Why all three layers?
| Layer | Trigger | Mechanism | Covers |
|---|---|---|---|
| Shepherd | BEAM process dies | UDS POLLHUP → kill child group | BEAM SIGKILL, OOM kill, segfault |
| Watcher | GenServer crashes | Process.monitor → NIF kill | Elixir-level crashes, unhandled errors |
| NIF destructor | FD resource GC'd | close(fd) → child SIGPIPE/EOF | Resource leaks, process table cleanup |
I/O Architecture
All I/O goes through the NIF using enif_select, which integrates with the BEAM's epoll/kqueue event loop:
- Read: NIF attempts
read(fd). If data available, returns immediately. IfEAGAIN, registersenif_select(READ)and the GenServer parks the caller. - Write: NIF attempts
write(fd). Handles partial writes by retrying untilEAGAIN, then parks. - Ready notification: BEAM sends
{:select, resource, ref, :ready_input/:ready_output}to the GenServer, which retries parked operations.
All NIF functions run on dirty IO schedulers to prevent BEAM scheduler stalls.
PTY Mode
When pty: true is passed:
- Shepherd calls
openpty()instead ofpipe() - Child gets a controlling terminal (
setsid()+TIOCSCTTY) - Single bidirectional master FD is sent via SCM_RIGHTS
- BEAM dups the FD for independent stdin/stdout NIF resources
set_window_size/3sendsCMD_SET_WINSIZEto shepherd, which callsioctl(TIOCSWINSZ)
cgroup Support (Linux Only)
When cgroup_path: is set:
- Shepherd creates
/sys/fs/cgroup/{path}directory - Moves child PID to
cgroup.procs - On cleanup, writes
1tocgroup.killand removes the directory - No-op on macOS/BSD
Parallelism Model
Every NetRunner process is fully independent:
- Each command gets its own shepherd process, pipe FDs, and GenServer
- NIF functions run on BEAM's dirty IO scheduler pool (default 10 threads)
enif_selectintegrates with BEAM's epoll/kqueue — handles thousands of concurrent FDs- No global lock, no shared process manager