Linux per-process capability primitives — the kernel's five capability sets (effective, permitted, inheritable, bounding, ambient) and the syscalls that manipulate them.
Why a separate subsystem
Linux capabilities partition the historical "root vs not-root"
binary into ~41 fine-grained powers (CAP_NET_ADMIN,
CAP_SYS_ADMIN, CAP_NET_BIND_SERVICE, …). A security-conscious
container runtime drops everything the workload doesn't need
before execve, so a compromise of e.g. nginx can't reach for
arbitrary kernel surface. Linx.Capabilities is the primitive
that makes that drop possible from Elixir.
This is not a security-policy engine. It exposes "read these caps" and "drop these caps from this set on this session." What each workload should have is policy and lives in a consumer.
Two layers — read and write
The read side is host-side, pure Elixir File.read/1 against
/proc/<pid>/status. Works against any live process without
cooperation from the target.
The write side is fundamentally different: capability
manipulation is per-thread — capset(2), prctl(PR_CAPBSET_*),
and prctl(PR_CAP_AMBIENT_*) all operate on the calling thread.
So the child agent in Linx.Process has to do its own cap
configuration. The write verbs (drop_bounding/2,
set_thread_sets/2, set_ambient/2) are checkpoint-bound: only
valid in the :ready (parked) state, same shape as
Linx.Process.proceed/1 / abort/1.
MapSets of :cap_* atoms
Cap sets are 64-bit kernel bitmasks. In Elixir they show up as
MapSets of :cap_* atoms (the lowercase form of the kernel's
CAP_* constants):
MapSet.new([:cap_net_admin, :cap_sys_admin])Set operations (MapSet.union/2, MapSet.difference/2) come for
free; pattern-matching on cap atoms is natural; the bitmask
conversion happens in one place (Linx.Capabilities.Constants).
The :cap_ prefix is kept so the atom is unambiguous in a
mailbox of mixed message types.
Composition with Linx.Process
The motivating composition:
{:ok, c} = Linx.Process.spawn(argv: ["/usr/sbin/nginx"], stdio: :pty)
receive do {:linx_process, :ready, _} -> :ok end
# Strip everything except the one cap nginx actually needs.
keep = [:cap_net_bind_service]
:ok = Linx.Capabilities.set_thread_sets(c,
effective: keep, permitted: keep, inheritable: [])
:ok = Linx.Capabilities.drop_bounding(c,
MapSet.difference(Linx.Capabilities.Constants.all(),
MapSet.new(keep)))
:ok = Linx.Process.proceed(c)After proceed/1, the workload runs with exactly
cap_net_bind_service — even if its binary has file caps that
would otherwise grant more, because :cap_setpcap was dropped
from :bounding too.
See docs/capabilities/EXAMPLES.md for end-to-end recipes.
Summary
Types
A capability atom — the lowercase form of a kernel CAP_*
constant, prefixed with :cap_. Examples
A set of capabilities — a MapSet of :cap_* atoms. The public
write verbs accept any Enumerable of caps (list, MapSet,
Stream) for convenience; the canonical representation is
MapSet.
Functions
Drops capabilities from the child thread's bounding set on a
parked Linx.Process session.
Reads a process's capability sets from /proc/<pid>/status.
Sets the child thread's ambient capability set on a parked
Linx.Process session.
Sets the child thread's effective, permitted, and inheritable
capability sets on a parked Linx.Process session.
Returns true iff Linux capabilities are inspectable on this
host — i.e. /proc/self/status contains a CapBnd: line.
Types
@type cap() :: atom()
A capability atom — the lowercase form of a kernel CAP_*
constant, prefixed with :cap_. Examples:
:cap_net_admin
:cap_sys_admin
:cap_net_bind_serviceSee Linx.Capabilities.Constants.all/0 for the full set.
A set of capabilities — a MapSet of :cap_* atoms. The public
write verbs accept any Enumerable of caps (list, MapSet,
Stream) for convenience; the canonical representation is
MapSet.
Functions
@spec drop_bounding(Linx.Process.t(), Enumerable.t()) :: :ok | {:error, :not_ready | :running | :no_process | {:bad_capability, term()}}
Drops capabilities from the child thread's bounding set on a
parked Linx.Process session.
caps is a MapSet or list of :cap_* atoms. The operation is
one-way (prctl(PR_CAPBSET_DROP)) — the kernel will refuse to
re-add a dropped cap via any subsequent verb on the same thread,
even via set_thread_sets/2.
Errors
{:error, :not_ready}— session not yet at the checkpoint.{:error, :running}— pastproceed/1, the child is inexecve'd land.{:error, :no_process}— session has ended.{:error, {:bad_capability, atom}}—capscontains an atom Linx doesn't recognise. Validation happens before anything is sent to the agent.
Kernel-level failures (the workload didn't have the required
privilege to drop a particular cap, etc.) arrive asynchronously
as {:linx_process, :error, errno, :cap_drop_bounding} on the
session's owner mailbox, the same shape as other pre-execve
failures.
Example
:ok = Linx.Capabilities.drop_bounding(session,
[:cap_sys_admin, :cap_sys_module, :cap_dac_override])
@spec read(pos_integer() | :self) :: {:ok, Linx.Capabilities.State.t()} | {:error, Linx.Capabilities.Error.t()}
Reads a process's capability sets from /proc/<pid>/status.
Accepts a positive integer pid, or :self as a convenience for
the BEAM's own status. Returns
{:ok, %Linx.Capabilities.State{}} on success, or
{:error, %Linx.Capabilities.Error{}} if the procfs read failed
or the file didn't contain the five Cap*: lines we expected.
Examples
iex> {:ok, %Linx.Capabilities.State{} = state} = Linx.Capabilities.read(:self)
iex> is_struct(state.effective, MapSet) and is_struct(state.bounding, MapSet)
true
# Bogus pid -> structured error.
iex> {:error, %Linx.Capabilities.Error{errno: :enoent}} =
...> Linx.Capabilities.read(1_234_567_890)
iex> true
trueForward compatibility
If the kernel reports a bit that isn't in Linx's 41-entry table
(a newer kernel adding caps Linx hasn't catalogued), the bit is
silently dropped from the returned MapSets and a single
Logger.warning/1 is emitted. The returned %State{} is still
valid for every cap Linx does know about.
@spec set_ambient(Linx.Process.t(), Enumerable.t()) :: :ok | {:error, :not_ready | :running | :no_process | {:bad_capability, term()}}
Sets the child thread's ambient capability set on a parked
Linx.Process session.
caps is a MapSet or list of :cap_* atoms. The ambient set
is replaced (the kernel only exposes per-cap RAISE/LOWER plus
a global CLEAR_ALL, so the natural shape is "clear then raise
each requested cap").
Ambient caps are the mechanism that lets a non-root, no-file-cap
binary still inherit capabilities across execve — useful when
you want a workload to start with e.g. :cap_net_bind_service
but don't want to put file caps on the binary or run it as root.
See capabilities(7) "Ambient capabilities" for the full rules
(notably: every ambient cap must also be in the permitted and
inheritable sets, or the raise fails).
Errors
Same shape as drop_bounding/2. Kernel failures (a raise that
fails because the cap isn't in permitted+inheritable, etc.)
arrive as {:linx_process, :error, errno, :cap_set_ambient}.
@spec set_thread_sets( Linx.Process.t(), keyword() ) :: :ok | {:error, :not_ready | :running | :no_process | {:bad_capability, term()} | {:bad_thread_sets, {:missing, atom()}}}
Sets the child thread's effective, permitted, and inheritable
capability sets on a parked Linx.Process session.
opts is a keyword list with all three required keys:
:effective, :permitted, :inheritable. Each value is a
MapSet or list of :cap_* atoms (use [] or MapSet.new()
to clear a set).
Implemented via capset(2) in the agent. The kernel enforces the
invariants documented in capabilities(7) — notably that
:effective ⊆ :permitted and :inheritable ⊆ :permitted ∪ I_old.
Violations arrive as {:linx_process, :error, :einval, :cap_set_thread} on the owner mailbox.
"Leave unchanged" not yet supported
A future revision will accept missing keys as "leave this set
as-is" (the agent would read its own /proc/self/status to
fill in). For now, callers that want one set unchanged must
read it first via Linx.Capabilities.read(host_pid) and pass
it back through here.
Errors
Same shape as drop_bounding/2. Additional caller-side errors:
{:error, {:bad_thread_sets, {:missing, key}}}— one of:effective,:permitted,:inheritablewas omitted.{:error, {:bad_capability, atom}}— any of the three values contained an unknown cap atom.
@spec supported?() :: boolean()
Returns true iff Linux capabilities are inspectable on this
host — i.e. /proc/self/status contains a CapBnd: line.
True on every Linux ≥ 2.6.25 (every kernel Linx targets). Useful as a precondition guard or in setup checks; this module's verbs don't gate on it themselves.