Linx.Process (Linx v0.1.0)

Copy Markdown View Source

Linux process-lifecycle primitives — clone(2) with namespace flags, setns(2), execve(2), signal delivery and exit-status reporting — exposed through one GenServer per spawned child.

Why a separate OS process

clone(), fork() and unshare() performed inside the multithreaded BEAM corrupt the VM. So the actual syscalls live in a small external C binary — priv/linx_process, built from c_src/linx_process.c by the :linx_process Mix compiler — spawned via Port.open with :nouse_stdio and {:packet, 4} framing. Control traffic is Erlang External Term Format on fd 3 (BEAM → binary) and fd 4 (binary → BEAM); fd 0/1/2 stay free for the workload's stdio.

This module IS the GenServer. The pid returned by spawn/1 (and later enter/2) is the session handle: pass it to proceed/1, signal/2, wait/1, info/1, and pty_master/1.

Owner events

The owner (the caller of spawn/1, or :owner explicitly) receives these messages over the course of a session:

  • {:linx_process, :ready, host_pid} — the child reached the checkpoint. host_pid is the workload's pid in the host's PID namespace — the value you use to address it from the host (/proc/<host_pid>/..., setns, mounts, uid maps, signals). The child's own view of its pid (1 inside a fresh PID namespace) is available via info/1's :child_pid if you need it.
  • {:linx_process, :running} — the child has execve'd the workload.
  • {:linx_process, :exited, code} — the workload exited normally.
  • {:linx_process, :signaled, signum} — the workload was killed by a signal.
  • {:linx_process, :aborted}abort/1 succeeded; the workload never reached execve.
  • {:linx_process, :pty_out, binary} — PTY mode only; bytes the workload wrote to its terminal.
  • {:linx_process, :error, errno, stage} — a pre-exec failure or a transport-level problem; see the stage table below.

Each session emits exactly one terminal event (:exited / :signaled / :aborted / :error) and then no further owner messages follow. The GenServer stays alive so wait/1 callers blocked on it can still receive the recorded answer; it terminates with the linked spawn/1 caller.

Error stages

The stage atom in {:linx_process, :error, errno, stage} names what failed. The errno is a POSIX errno integer (per linux/asm-generic/errno-base.h), with two exceptions noted below.

Syscall failures in the agent (pre-clone setup)

  • :posix_openpt, :ptsetup, :ptsname, :pts_open — PTY pair creation (PTY mode only).
  • :sigprocmask, :pipe2, :signalfd — internal pipe and signal plumbing.

Process creation

  • :cloneclone(2) failed (spawn mode).
  • :forkfork(2) failed (enter mode).

Namespace entry (enter mode only)

  • :open_ns_<type>/proc/<target>/ns/<type> couldn't be opened. <type> is one of user mnt uts ipc cgroup net time pid.
  • :setns_<type>setns(2) failed for that namespace.

Child-side pre-exec failures (post-checkpoint)

  • :stdioapply_stdio failed (dup2 onto 0/1/2, AFUNIX connect for `{:connect_unix, }, or the PTY slave'sTIOCSCTTY`).
  • :chdirchdir(2) to the :cwd option failed in the child (e.g. the directory doesn't exist in the workload's root).
  • :execveexecve(2) returned (i.e. failed).
  • :cap_drop_bounding, :cap_set_thread, :cap_set_ambient — one of the capability syscalls failed in the child (Linx.Capabilities).
  • :seccomp_installseccomp(SECCOMP_SET_MODE_FILTER, …) failed in the child (Linx.Seccomp.install/2). Common errno is EINVAL (22) for a malformed cBPF blob; EPERM (1) when the caller is unprivileged and PR_SET_NO_NEW_PRIVS isn't on (and the "be helpful" auto-set also failed).
  • :seccomp_no_new_privsprctl(PR_SET_NO_NEW_PRIVS, 1) failed in the child. Rare; the only documented failure mode is EINVAL under an exotic LSM policy.

Transport (BEAM ↔ agent wire)

  • :malformed_request — the agent couldn't parse the {:spawn, _} / {:enter, _} request. errno is EINVAL (22).
  • :request_too_big — the request exceeded the agent's 32 KiB buffer. errno is EMSGSIZE (90).
  • :command_too_big — a post-:running command exceeded the buffer; the session is torn down. errno is EMSGSIZE.
  • :ready_frame — couldn't read the {:ready, _} frame from the child (child died early, internal pipe broke). errno is the underlying I/O error or EIO on EOF.
  • :malformed_ready — got bytes but couldn't decode them as a {:ready, _} ei frame. errno is EPROTO (71).
  • :exec_outcome — couldn't read the post-:proceed outcome from the child. errno is EIO.

Catastrophic agent failure (BEAM-side synthesised)

  • :agent_died — the agent process exited without sending any terminal status frame (segfault, OOM-kill, hard _exit from an unanticipated path). The second element is the agent's exit code, not a POSIX errno; the :agent_died stage tag is the signal that interpretation differs. This message is synthesised by the BEAM-side GenServer on {port, {:exit_status, _}} when no other terminal has been recorded yet, so the owner never hangs.

Summary

Functions

Releases a parked session without running the workload. The alternative to proceed/1 from the :ready state.

Builds a supervisor child specification that runs spawn/1 under supervision — the way to auto-restart a workload "with the same arguments".

Runs a new process inside an existing target's namespaces via setns(2) + execve(2).

Returns the workload's pid as the host sees it.

Returns a snapshot of the session's state as a %Linx.Process.Info{}.

Advances the child past the checkpoint: the agent forwards :proceed to the cloned child, which then execves the workload.

Returns {:ok, session} if the session was started with stdio: :pty — the session pid is itself the handle to read from (via {:linx_process, :pty_out, _} events on the owner) and to write to (via pty_write/2). Returns {:error, :no_pty} otherwise.

Sets the workload's PTY window size (TIOCSWINSZ on the master end, via the agent).

Writes bytes to the workload's PTY master, which the workload sees as input on its stdin.

Reassigns the session's owner — the process that receives the {:linx_process, _} lifecycle events and, in PTY mode, :pty_out. Returns :ok (or {:error, :no_process} if the session GenServer is already gone).

Sends OS signal signum to the workload.

Spawns a child process via clone(2), optionally into fresh namespaces.

Synchronously waits for the workload's terminal event.

Types

namespace()

@type namespace() :: :net | :mount | :pid | :uts | :ipc | :user | :cgroup | :time

t()

@type t() :: pid()

Functions

abort(session)

@spec abort(t()) :: :ok | {:error, :running | :no_process}

Releases a parked session without running the workload. The alternative to proceed/1 from the :ready state.

When the agent is parked at the checkpoint (post-:ready, pre-:running), abort/1 tells it to discard the cloned child rather than letting it execve. The agent closes the child's unblock pipe so the child sees EOF and _exits, reaps it, and emits {:status, :aborted, child_pid} over the control channel. The owner then receives {:linx_process, :aborted} and the session moves to its terminal state.

Use cases

  • Setup-time rollback. A container engine starts spawning, discovers setup can't complete (cgroup creation fails, a bind mount errors, …), and wants to cancel the workload cleanly without it running for even one instruction.
  • Checkpoint-only verification. A test or health check that wants to confirm namespace setup worked without actually running the workload — e.g. an integration test that pivots /proc inside a fresh mount namespace and just wants to verify via mountinfo.
  • Race-with-decision. The owner's "should I proceed?" logic returns false; abort/1 is the clean discard.

State semantics

  • Pre-:ready — buffered; fires the moment :ready arrives. Same shape as signal/2's pre-:running buffering.
  • :ready (parked) — primary case; immediate abort.
  • :running{:error, :running}. The workload is past the checkpoint; use signal/2 to terminate it.
  • Already terminal{:error, :no_process}.

Fire-and-forget — abort/1 returns as soon as the agent has the request. Use wait/1 to block on the :aborted terminal event.

child_spec(init_arg)

@spec child_spec(keyword()) :: Supervisor.child_spec()

Builds a supervisor child specification that runs spawn/1 under supervision — the way to auto-restart a workload "with the same arguments".

opts are spawn/1's options, plus child-spec controls:

  • :id — child id; defaults to Linx.Process.
  • :restart:permanent (default), :transient, or :temporary.
  • :shutdown — shutdown timeout in ms; defaults to 5000.

The spec forces linger: false (unless you set it), so the session stops when its workload reaches a terminal state and the supervisor can apply its restart strategy. Exit-reason mapping (what :transient keys off):

  • exit 0 → :normal — no :transient restart.
  • exit N≠0 → {:exited, N} — abnormal, restarted.
  • killed by signal → {:signaled, signum} — abnormal, restarted.
  • abort/1 at the checkpoint → {:shutdown, :aborted} — no :transient restart.
  • setup/agent error → {:error, %Linx.Process.Error{}} — abnormal.

Pass :owner to direct lifecycle events at a consumer (it defaults to the starting process, i.e. the supervisor, which just drops them). For a workload that needs no checkpoint configuration, also pass auto_proceed: true so it runs without an external proceed/1 — the supervisor holds the session pid, not the owner, so nothing else can advance it.

children = [
  {Linx.Process,
   argv: ["/usr/bin/myd"], owner: MyApp.Events, auto_proceed: true, restart: :transient}
]
Supervisor.start_link(children, strategy: :one_for_one)

enter(target_pid, opts)

@spec enter(
  pos_integer(),
  keyword()
) :: {:ok, t()} | {:error, term()}

Runs a new process inside an existing target's namespaces via setns(2) + execve(2).

The agent opens /proc/<target_pid>/ns/<type> for each namespace type and setns(2)s into each, then fork(2)s — the child inherits the target's namespaces and execves the workload there. Same checkpoint protocol as spawn/1: the owner gets :readyproceed/1:running → terminal.

target_pid is the host pid of the process whose namespaces you want to join — the pid you saw in {:linx_process, :ready, host_pid} (or, equivalently, host_pid/1 / Linx.Process.info/1's :host_pid).

opts:

  • :argv (required) — the workload argv.
  • :namespaces — which of the target's namespaces to join. Defaults to all — every namespace type the target has under /proc/<target>/ns/. Pass a list (e.g. [:net]) to join only those.
  • :env — workload environment as ["KEY=VAL", …]. Defaults to inherit.
  • :owner — pid to receive lifecycle events. Defaults to the caller.

host_pid(session)

@spec host_pid(t()) :: {:ok, pos_integer()} | {:error, :not_ready}

Returns the workload's pid as the host sees it.

This is the same value the owner receives in {:linx_process, :ready, host_pid}; host_pid/1 is the convenience accessor for when you hold the session but didn't capture (or have already consumed) the :ready message.

Use the host pid whenever you address the workload from the host — typically procfs paths like /proc/<host_pid>/{ns,uid_map,gid_map,setgroups,mountinfo}. Every cross-namespace primitive in Linx (Linx.Mount's :in: {:pid, _}, Linx.User.set_uid_map/2, Linx.User.setup_maps/2) wants the host pid. The workload's own view of its pid (1 inside a fresh PID namespace) is a separate value, available via info/1's :child_pid.

Returns

  • {:ok, host_pid} — the agent has reported :spawned (which arrives before :ready), so the value is available.
  • {:error, :not_ready} — the spawn hasn't progressed far enough yet. Typically only possible if you call host_pid/1 synchronously after spawn/1 without first awaiting any lifecycle event. Once you've seen :ready, host_pid/1 always succeeds.

Example

{:ok, c} = Linx.Process.spawn(argv: [...], namespaces: [:user, :pid])
host_pid = receive do {:linx_process, :ready, p} -> p end
:ok = Linx.User.setup_maps(host_pid, uid: [...], gid: [...])

info(session)

@spec info(t()) :: {:ok, Linx.Process.Info.t()} | {:error, term()}

Returns a snapshot of the session's state as a %Linx.Process.Info{}.

Cheap — a single GenServer.call returning the relevant fields from the GenServer's internal state. Safe to call at any point in the lifecycle, including post-terminal.

Examples

iex> {:ok, c} = Linx.Process.spawn(argv: ["/bin/sleep", "10"])
iex> {:ok, info} = Linx.Process.info(c)
iex> info.mode
:spawn
iex> info.stage in [:starting, :spawned, :ready]
true

See Linx.Process.Info for the full field list and the eight possible :stage atoms.

proceed(session)

@spec proceed(t()) :: :ok | {:error, term()}

Advances the child past the checkpoint: the agent forwards :proceed to the cloned child, which then execves the workload.

The wire-level command this sends is :proceed, which is also the Elixir verb name — one word for the same action on both sides of the Port boundary.

Returns :ok, {:error, :not_ready} if the agent has not yet reported :ready (i.e. there is no checkpoint to advance past), or {:error, :no_process} if the workload has already reached a terminal stage — calling proceed/1 on a session whose workload has already exited / aborted / errored is a no-op the GenServer refuses cleanly rather than sending a stale :proceed to an agent that's been collected.

pty_master(session)

@spec pty_master(t()) :: {:ok, t()} | {:error, term()}

Returns {:ok, session} if the session was started with stdio: :pty — the session pid is itself the handle to read from (via {:linx_process, :pty_out, _} events on the owner) and to write to (via pty_write/2). Returns {:error, :no_pty} otherwise.

A future Linx.Tty subsystem will likely return something richer here — a struct wrapping the session, terminal-mode helpers, etc. For now it just confirms PTY-mode-ness.

pty_set_winsize(session, bad)

@spec pty_set_winsize(
  t(),
  {non_neg_integer(), non_neg_integer(), non_neg_integer(), non_neg_integer()}
  | %{
      :rows => non_neg_integer(),
      :cols => non_neg_integer(),
      :xpixel => non_neg_integer(),
      :ypixel => non_neg_integer(),
      optional(any()) => any()
    }
) :: :ok | {:error, term()}

Sets the workload's PTY window size (TIOCSWINSZ on the master end, via the agent).

Accepts either a 4-tuple {rows, cols, xpixel, ypixel} or any map / struct exposing those fields (Linx.Tty.WindowSize is the canonical such struct, but Linx.Process deliberately doesn't depend on Linx.Tty — duck-typing on the field shape avoids the cross-subsystem dependency).

Best-effort on the agent side: the workload will see SIGWINCH and the new size on its next TIOCGWINSZ, but no error is propagated back if the ioctl fails.

Returns {:error, :no_pty} if the session wasn't started with stdio: :pty; {:error, :no_process} if the workload has already terminated.

pty_write(session, bytes)

@spec pty_write(t(), iodata()) :: :ok | {:error, term()}

Writes bytes to the workload's PTY master, which the workload sees as input on its stdin.

Returns {:error, :no_pty} if the session was not started with stdio: :pty; {:error, :no_process} if the workload has already terminated (reached any of :exited / :signaled / :aborted / :errored) — the call refuses immediately rather than firing a Port.command at an agent that's been collected or is about to be.

Fire-and-forget on the happy path — bytes are handed to the agent (and from there to the PTY); there is no acknowledgement.

set_owner(session, new_owner)

@spec set_owner(t(), pid()) :: :ok | {:error, :no_process}

Reassigns the session's owner — the process that receives the {:linx_process, _} lifecycle events and, in PTY mode, :pty_out. Returns :ok (or {:error, :no_process} if the session GenServer is already gone).

The owner is set at spawn/1 / enter/2 (defaulting to the caller) and is normally the process supervising the workload. set_owner/2 hands that event stream to a different process for a while — the model behind interactively attaching to a session another process owns:

  • the supervisor calls set_owner(session, attacher) so the attaching process receives :pty_out (and lifecycle) for the duration,
  • the attacher runs Linx.Tty.attach/3,
  • on return the supervisor calls set_owner(session, supervisor) to take the stream back.

Only one owner receives events at a time. If the workload terminates while detached (owned by the attacher), the supervisor will not have seen the :exited / :signaled event — so after reclaiming ownership it should re-derive the workload's state from info/1 and act on it. This keeps the handoff a clean single-owner swap, with the lifecycle decision level-triggered on the supervisor side rather than threaded through the attach.

Setting the owner on a session whose workload has already terminated is harmless (the session lingers); the new owner simply won't receive past events.

signal(session, signum)

@spec signal(t(), pos_integer()) :: :ok | {:error, term()}

Sends OS signal signum to the workload.

Signals delivered before the workload has execve'd (between spawn/1 and proceed/1, or before the agent emits :running) are buffered and flushed in order at the moment of :running. Signals delivered after the workload has exited return {:error, :no_process}.

This is fire-and-forget — signal/2 returns as soon as the signal has been handed to the agent (or buffered), without waiting for the kernel to deliver it. Use wait/1 to observe the workload's response.

spawn(opts)

@spec spawn(keyword()) :: {:ok, t()} | {:error, term()}

Spawns a child process via clone(2), optionally into fresh namespaces.

Returns {:ok, pid} — the pid of the GenServer that owns the child and is the session handle.

opts:

  • :argv (required) — the workload argv as a list of binaries. The first element is the absolute path of the executable; no $PATH lookup is performed.
  • :namespaces — list of namespace/0 atoms to create fresh. Defaults to [] (share all of the BEAM's namespaces).
  • :env — environment as a list of "KEY=VALUE" binaries. Defaults to inheriting the BEAM's environment.
  • :cwd — the workload's working directory, chdir'd to in the child just before execve. Defaults to inheriting the agent's cwd. Set it when the workload runs in a pivoted rootfs, where the inherited cwd may not exist in the new root (e.g. the image's WorkingDir, or "/").
  • :owner — pid to receive lifecycle events. Defaults to the caller.
  • :linger — when true (default), the session GenServer stays alive after the workload reaches a terminal state, so wait/1 and info/1 keep working. When false, it stops with an outcome-derived exit reason (see child_spec/1) — the mode for supervised use. child_spec/1 sets this to false.
  • :auto_proceed — when true, the session advances past the :ready checkpoint by itself (no external proceed/1). Defaults to false, preserving the checkpoint window for per-instance configuration (capabilities, seccomp, sysctls into the new namespaces). Set it true for supervised workloads that need no such configuration — otherwise a supervised child blocks at :ready forever, since the supervisor holds the session pid, not the owner.
  • :stdio — workload fd 0/1/2 plumbing. See "Stdio plumbing" below.

Stdio plumbing

:stdio is either a single atom shorthand applying to all three fds, or a keyword list giving per-fd directives.

Shorthand atoms:

  • :inherit (default) — the workload inherits the BEAM's stdio.
  • :devnull — all three fds are /dev/null.
  • :pty — the agent creates a PTY pair; the workload becomes session leader with the slave as its controlling terminal, with 0/1/2 dup'd onto it. The master end stays in the agent and the bytes are proxied through the existing control channel: writes via pty_write/2, reads delivered to the owner as {:linx_process, :pty_out, bytes}.

Per-fd keyword list[stdin: dir, stdout: dir, stderr: dir], each dir one of:

  • :inherit — leave that fd untouched.
  • :devnull — dup /dev/null onto it.
  • {:connect_unix, "/path/to/socket"} — the workload connects an AF_UNIX stream socket to path and dup2's it onto the fd. The listener at path is the caller's responsibility (must be :gen_tcp.listen-ing before spawn/1).

Per-fd PTY directives are not supported — a PTY is one device shared across all three fds; use the :pty shorthand.

wait(session, timeout \\ :infinity)

@spec wait(t(), timeout()) ::
  {:ok, {:exited, non_neg_integer()} | {:signaled, pos_integer()} | :aborted}
  | {:error, term()}

Synchronously waits for the workload's terminal event.

Returns one of:

  • {:ok, {:exited, code}} — workload exited with code.
  • {:ok, {:signaled, signum}} — workload was killed by signum.
  • {:ok, :aborted}abort/1 was called from the checkpoint; the workload never ran.
  • {:error, %Linx.Process.Error{}} — a pre-exec failure; the workload never ran. (The same failure also reaches the owner as the positional event {:linx_process, :error, errno, stage}.)
  • {:error, :timeout}timeout elapsed before any terminal event arrived. The session is still alive; call wait/1 again.
  • {:error, :no_process} — the session GenServer is gone (e.g. the agent crashed before reporting a terminal event).

Multiple processes may wait on the same session concurrently; all receive the same answer when it arrives.