Linx.Capabilities (Linx v0.1.0)

Copy Markdown View Source

Linux per-process capability primitives — the kernel's five capability sets (effective, permitted, inheritable, bounding, ambient) and the syscalls that manipulate them.

Why a separate subsystem

Linux capabilities partition the historical "root vs not-root" binary into ~41 fine-grained powers (CAP_NET_ADMIN, CAP_SYS_ADMIN, CAP_NET_BIND_SERVICE, …). A security-conscious container runtime drops everything the workload doesn't need before execve, so a compromise of e.g. nginx can't reach for arbitrary kernel surface. Linx.Capabilities is the primitive that makes that drop possible from Elixir.

This is not a security-policy engine. It exposes "read these caps" and "drop these caps from this set on this session." What each workload should have is policy and lives in a consumer.

Two layers — read and write

The read side is host-side, pure Elixir File.read/1 against /proc/<pid>/status. Works against any live process without cooperation from the target.

The write side is fundamentally different: capability manipulation is per-threadcapset(2), prctl(PR_CAPBSET_*), and prctl(PR_CAP_AMBIENT_*) all operate on the calling thread. So the child agent in Linx.Process has to do its own cap configuration. The write verbs (drop_bounding/2, set_thread_sets/2, set_ambient/2) are checkpoint-bound: only valid in the :ready (parked) state, same shape as Linx.Process.proceed/1 / abort/1.

MapSets of :cap_* atoms

Cap sets are 64-bit kernel bitmasks. In Elixir they show up as MapSets of :cap_* atoms (the lowercase form of the kernel's CAP_* constants):

MapSet.new([:cap_net_admin, :cap_sys_admin])

Set operations (MapSet.union/2, MapSet.difference/2) come for free; pattern-matching on cap atoms is natural; the bitmask conversion happens in one place (Linx.Capabilities.Constants). The :cap_ prefix is kept so the atom is unambiguous in a mailbox of mixed message types.

Composition with Linx.Process

The motivating composition:

{:ok, c} = Linx.Process.spawn(argv: ["/usr/sbin/nginx"], stdio: :pty)
receive do {:linx_process, :ready, _} -> :ok end

# Strip everything except the one cap nginx actually needs.
keep = [:cap_net_bind_service]
:ok = Linx.Capabilities.set_thread_sets(c,
        effective: keep, permitted: keep, inheritable: [])
:ok = Linx.Capabilities.drop_bounding(c,
        MapSet.difference(Linx.Capabilities.Constants.all(),
                          MapSet.new(keep)))

:ok = Linx.Process.proceed(c)

After proceed/1, the workload runs with exactly cap_net_bind_service — even if its binary has file caps that would otherwise grant more, because :cap_setpcap was dropped from :bounding too.

See docs/capabilities/EXAMPLES.md for end-to-end recipes.

Summary

Types

A capability atom — the lowercase form of a kernel CAP_* constant, prefixed with :cap_. Examples

A set of capabilities — a MapSet of :cap_* atoms. The public write verbs accept any Enumerable of caps (list, MapSet, Stream) for convenience; the canonical representation is MapSet.

Functions

Drops capabilities from the child thread's bounding set on a parked Linx.Process session.

Reads a process's capability sets from /proc/<pid>/status.

Sets the child thread's ambient capability set on a parked Linx.Process session.

Sets the child thread's effective, permitted, and inheritable capability sets on a parked Linx.Process session.

Returns true iff Linux capabilities are inspectable on this host — i.e. /proc/self/status contains a CapBnd: line.

Types

cap()

@type cap() :: atom()

A capability atom — the lowercase form of a kernel CAP_* constant, prefixed with :cap_. Examples:

:cap_net_admin
:cap_sys_admin
:cap_net_bind_service

See Linx.Capabilities.Constants.all/0 for the full set.

cap_set()

@type cap_set() :: MapSet.t(cap())

A set of capabilities — a MapSet of :cap_* atoms. The public write verbs accept any Enumerable of caps (list, MapSet, Stream) for convenience; the canonical representation is MapSet.

Functions

drop_bounding(session, caps)

@spec drop_bounding(Linx.Process.t(), Enumerable.t()) ::
  :ok
  | {:error, :not_ready | :running | :no_process | {:bad_capability, term()}}

Drops capabilities from the child thread's bounding set on a parked Linx.Process session.

caps is a MapSet or list of :cap_* atoms. The operation is one-way (prctl(PR_CAPBSET_DROP)) — the kernel will refuse to re-add a dropped cap via any subsequent verb on the same thread, even via set_thread_sets/2.

Errors

  • {:error, :not_ready} — session not yet at the checkpoint.
  • {:error, :running} — past proceed/1, the child is in execve'd land.
  • {:error, :no_process} — session has ended.
  • {:error, {:bad_capability, atom}}caps contains an atom Linx doesn't recognise. Validation happens before anything is sent to the agent.

Kernel-level failures (the workload didn't have the required privilege to drop a particular cap, etc.) arrive asynchronously as {:linx_process, :error, errno, :cap_drop_bounding} on the session's owner mailbox, the same shape as other pre-execve failures.

Example

:ok = Linx.Capabilities.drop_bounding(session,
  [:cap_sys_admin, :cap_sys_module, :cap_dac_override])

read(pid)

@spec read(pos_integer() | :self) ::
  {:ok, Linx.Capabilities.State.t()} | {:error, Linx.Capabilities.Error.t()}

Reads a process's capability sets from /proc/<pid>/status.

Accepts a positive integer pid, or :self as a convenience for the BEAM's own status. Returns {:ok, %Linx.Capabilities.State{}} on success, or {:error, %Linx.Capabilities.Error{}} if the procfs read failed or the file didn't contain the five Cap*: lines we expected.

Examples

iex> {:ok, %Linx.Capabilities.State{} = state} = Linx.Capabilities.read(:self)
iex> is_struct(state.effective, MapSet) and is_struct(state.bounding, MapSet)
true

# Bogus pid -> structured error.
iex> {:error, %Linx.Capabilities.Error{errno: :enoent}} =
...>   Linx.Capabilities.read(1_234_567_890)
iex> true
true

Forward compatibility

If the kernel reports a bit that isn't in Linx's 41-entry table (a newer kernel adding caps Linx hasn't catalogued), the bit is silently dropped from the returned MapSets and a single Logger.warning/1 is emitted. The returned %State{} is still valid for every cap Linx does know about.

set_ambient(session, caps)

@spec set_ambient(Linx.Process.t(), Enumerable.t()) ::
  :ok
  | {:error, :not_ready | :running | :no_process | {:bad_capability, term()}}

Sets the child thread's ambient capability set on a parked Linx.Process session.

caps is a MapSet or list of :cap_* atoms. The ambient set is replaced (the kernel only exposes per-cap RAISE/LOWER plus a global CLEAR_ALL, so the natural shape is "clear then raise each requested cap").

Ambient caps are the mechanism that lets a non-root, no-file-cap binary still inherit capabilities across execve — useful when you want a workload to start with e.g. :cap_net_bind_service but don't want to put file caps on the binary or run it as root. See capabilities(7) "Ambient capabilities" for the full rules (notably: every ambient cap must also be in the permitted and inheritable sets, or the raise fails).

Errors

Same shape as drop_bounding/2. Kernel failures (a raise that fails because the cap isn't in permitted+inheritable, etc.) arrive as {:linx_process, :error, errno, :cap_set_ambient}.

set_thread_sets(session, opts)

@spec set_thread_sets(
  Linx.Process.t(),
  keyword()
) ::
  :ok
  | {:error,
     :not_ready
     | :running
     | :no_process
     | {:bad_capability, term()}
     | {:bad_thread_sets, {:missing, atom()}}}

Sets the child thread's effective, permitted, and inheritable capability sets on a parked Linx.Process session.

opts is a keyword list with all three required keys: :effective, :permitted, :inheritable. Each value is a MapSet or list of :cap_* atoms (use [] or MapSet.new() to clear a set).

Implemented via capset(2) in the agent. The kernel enforces the invariants documented in capabilities(7) — notably that :effective ⊆ :permitted and :inheritable ⊆ :permitted ∪ I_old. Violations arrive as {:linx_process, :error, :einval, :cap_set_thread} on the owner mailbox.

"Leave unchanged" not yet supported

A future revision will accept missing keys as "leave this set as-is" (the agent would read its own /proc/self/status to fill in). For now, callers that want one set unchanged must read it first via Linx.Capabilities.read(host_pid) and pass it back through here.

Errors

Same shape as drop_bounding/2. Additional caller-side errors:

  • {:error, {:bad_thread_sets, {:missing, key}}} — one of :effective, :permitted, :inheritable was omitted.
  • {:error, {:bad_capability, atom}} — any of the three values contained an unknown cap atom.

supported?()

@spec supported?() :: boolean()

Returns true iff Linux capabilities are inspectable on this host — i.e. /proc/self/status contains a CapBnd: line.

True on every Linux ≥ 2.6.25 (every kernel Linx targets). Useful as a precondition guard or in setup checks; this module's verbs don't gate on it themselves.