Linx.Sysctl (Linx v0.1.0)

Copy Markdown View Source

Linux kernel tunable parameters — the /proc/sys/ surface, the same knobs sysctl(8) reads and writes.

Why a separate subsystem

Sysctls are a coherent kernel concept (~1500 named scalar tunables spanning networking, VM, filesystem, IPC, and kernel-wide policy) with their own procfs surface and their own per-namespace routing rules. Reading net.ipv4.ip_forward from inside a container doesn't yield the host's value — it yields the container's network namespace's value. Wrapping the surface as its own module keeps that routing model explicit instead of scattering procfs paths through every caller.

Driving use cases:

  • Host-side, from a Nerves application or a normal release — flip a knob like net.ipv4.ip_forward programmatically.
  • Container-side, at the Linx.Process checkpoint — set kernel.hostname, enable per-netns net.* knobs, configure kernel.shm* IPC limits, before the workload execves.
  • Container-side, at runtime — same verbs against a fully running namespace via the :in option.

procfs is the API

Every sysctl is a file under /proc/sys/. Dots in the key map to slashes in the path:

net.ipv4.ip_forward  ->  /proc/sys/net/ipv4/ip_forward
kernel.hostname      ->  /proc/sys/kernel/hostname
vm.swappiness        ->  /proc/sys/vm/swappiness

Reads return the file's contents (kernel always appends a \n, which we trim). Writes accept integers, strings, and lists of integers (for space-separated tuple-shaped knobs like kernel.printk or net.ipv4.tcp_rmem).

The legacy sysctl(2) syscall was removed from Linux in 5.5 and has been deprecated since 2.6.24; we don't expose it. procfs is the only API.

Primitives, not a config applier

Linx.Sysctl reads, writes, and lists knobs; it is deliberately not a sysctl.conf parser or applier. Parsing /etc/sysctl.d/*.conf, apply ordering, and reload policy belong to a consumer built on these primitives, not to Linx.

Single-shot declarative reconciliation — observe a desired %{key => value} map against the kernel, diff, and converge in one caller-driven pass — is mechanism and lives in Linx.Sysctl.Reconcile. It holds no long-lived state and owns no process; the loop that calls it on a cadence remains the consumer's.

Per-namespace vs global

The kernel routes each read or write through the calling task's namespace context:

SubtreeOwning namespace
net.*network
kernel.hostname, kernel.domainnameUTS
kernel.shm*, kernel.msg*, kernel.sem, fs.mqueue.*IPC
user.max_*_namespacesuser
vm.*, fs.file-max, kernel.printk, most elseglobal (host-only)

Trying to traverse /proc/<pid>/root/proc/sys/... to "see another namespace's value" does not work — the kernel resolves the value against the reader's namespace, not the path. The :in option is the supported way to read or write a non-host value: Linx.Sysctl.Native opens the target's namespace stack (user, mount, UTS, IPC, net) and setns(2)s into all five on a throwaway pthread, then performs the file I/O, then exits. Global sysctls return the same value from any namespace regardless of :in.

The :in option

Every verb in this module accepts an :in option, mirroring Linx.Mount's shape:

  • :self (default) — the BEAM's namespaces. Pure-Elixir file I/O over /proc/sys/; no NIF, no thread.
  • {:pid, n} — the namespace stack of pid n. Joins /proc/<n>/ns/{user,mnt,uts,ipc,net} on a throwaway pthread.
  • {:path, p} — a single explicit nsfd file path (less common; primarily for testing or for callers that already hold a pinned-namespace bind mount).

:in is lifecycle-agnostic: it works equally well between Linx.Process's :ready event and proceed/1 (the checkpoint window) and against a fully running container post-proceed/1.

Composition with Linx.Process

Same shape as Linx.Mount's :in: {:pid, _} — write knobs into a child's namespace while it's parked at the checkpoint, then proceed:

{:ok, c} =
  Linx.Process.spawn(argv: ["/bin/bash"], namespaces: [:net, :uts])

receive do {:linx_process, :ready, _} -> :ok end
{:ok, host_pid} = Linx.Process.host_pid(c)

:ok = Linx.Sysctl.write("net.ipv4.ip_forward", 1, in: {:pid, host_pid})
:ok = Linx.Sysctl.write("kernel.hostname", "ct0", in: {:pid, host_pid})

:ok = Linx.Process.proceed(c)

Linx.Process has zero awareness of sysctls; the checkpoint between :ready and proceed/1 is the only coupling, exactly the way Linx.Netlink / Linx.Cgroup / Linx.Mount / Linx.User integration works.

Forward compatibility

list/0..1 silently skip nodes they can't read (EACCES/EPERM/EIO) — the intent is "everything visible", not "everything that exists". An errno Linx hasn't catalogued surfaces as errno: :unknown with the raw integer preserved in :code.

Summary

Types

Target namespace for an operation

A sysctl key in dot form, e.g. "net.ipv4.ip_forward" or "kernel.hostname". Maps internally to a /proc/sys/<slashed> path.

Options accepted by every verb in this module.

A value to write to a sysctl. Integers and binaries cover the vast majority of knobs; lists of integers cover the space-separated tuple shapes (kernel.printk, net.ipv4.tcp_rmem, etc.).

Functions

Walks /proc/sys/ and returns every readable scalar as a list of %Linx.Sysctl.Entry{} structs, sorted by key.

Either list(prefix) to walk a subtree of /proc/sys/, or list(opts) to walk all of /proc/sys/ with options.

Walks the subtree of /proc/sys/ named by prefix with options.

Reads a sysctl as a trimmed binary.

Reads a sysctl and parses it as a single integer.

Reads a sysctl and parses it as a list of integers, split on whitespace.

Returns true iff the kernel exposes a /proc/sys/ tree on this host.

Writes a value to a sysctl.

Types

in_target()

@type in_target() :: :self | {:pid, pos_integer()} | {:path, String.t()}

Target namespace for an operation:

  • :self (default) — the BEAM's namespaces.
  • {:pid, n} — the namespace stack of pid n.
  • {:path, p} — an explicit nsfd path.

key()

@type key() :: String.t()

A sysctl key in dot form, e.g. "net.ipv4.ip_forward" or "kernel.hostname". Maps internally to a /proc/sys/<slashed> path.

opts()

@type opts() :: [{:in, in_target()}]

Options accepted by every verb in this module.

  • :in — target namespace, default :self. See in_target/0.

value()

@type value() :: integer() | binary() | [integer()]

A value to write to a sysctl. Integers and binaries cover the vast majority of knobs; lists of integers cover the space-separated tuple shapes (kernel.printk, net.ipv4.tcp_rmem, etc.).

Functions

list()

@spec list() :: {:ok, [Linx.Sysctl.Entry.t()]} | {:error, term()}

Walks /proc/sys/ and returns every readable scalar as a list of %Linx.Sysctl.Entry{} structs, sorted by key.

Unreadable nodes (some sysctls return EACCES / EPERM for unprivileged callers, write-only knobs return EIO) are silently skipped — the returned list is "everything I could see", not "everything that exists". On a typical Linux host expect ~1500 entries.

See list/1 for the prefix-or-options variant, and list/2 for the explicit prefix-plus-options form. Walking another process's namespace stack is list(in: {:pid, n}) or list("net.ipv4", in: {:pid, n}).

Examples

iex> {:ok, all} = Linx.Sysctl.list()
iex> Enum.find(all, & &1.key == "kernel.ostype")
#Linx.Sysctl.Entry<kernel.ostype = "Linux">

Note: a few sysctl files have dots in their leaf names (interface names like eth0.10 for VLANs). For those entries the dot-form key isn't unambiguously round-trippable back to a single procfs path. The string is still a faithful representation of where the value came from; consumers that need to act on those should keep the procfs path side-channel.

list(arg)

@spec list(key() | opts()) ::
  {:ok, [Linx.Sysctl.Entry.t()]}
  | {:error, Linx.Sysctl.Error.t() | {:bad_key, term()} | {:bad_in, term()}}

Either list(prefix) to walk a subtree of /proc/sys/, or list(opts) to walk all of /proc/sys/ with options.

Dispatch is by argument type: a binary is a dot-form prefix, a keyword list is an options list.

list(prefix) — subtree walk

list("net.ipv4") returns every readable scalar under /proc/sys/net/ipv4/, sorted by key. The trailing * is implicit; globs are not accepted. If the prefix names a leaf rather than a subtree (e.g. list("kernel.ostype")), the result is a single-element list containing that entry.

iex> {:ok, net} = Linx.Sysctl.list("net.ipv4")
iex> Enum.all?(net, &String.starts_with?(&1.key, "net.ipv4."))
true

iex> Linx.Sysctl.list("kernel.ostype")  # leaf, not subtree
{:ok, [#Linx.Sysctl.Entry<kernel.ostype = "Linux">]}

list(opts) — full walk with options

list(in: {:pid, n}) walks all of /proc/sys/ in the target's namespace stack. Equivalent to list("/", in: {:pid, n}) if such a "root prefix" were allowed.

iex> Linx.Sysctl.list(in: {:pid, container_pid})
{:ok, [...]}

list(prefix, opts)

@spec list(key(), opts()) ::
  {:ok, [Linx.Sysctl.Entry.t()]}
  | {:error, Linx.Sysctl.Error.t() | {:bad_key, term()} | {:bad_in, term()}}

Walks the subtree of /proc/sys/ named by prefix with options.

Same prefix semantics as list/1 (subtree → walk, leaf → single entry); same :in option as the other verbs.

Examples

# Read every net.ipv4 knob the container sees.
iex> Linx.Sysctl.list("net.ipv4", in: {:pid, container_pid})
{:ok, [...]}

# The container's view of its own hostname (a single-leaf prefix).
iex> Linx.Sysctl.list("kernel.hostname", in: {:pid, container_pid})
{:ok, [#Linx.Sysctl.Entry<kernel.hostname = "ct0">]}

read(key, opts \\ [])

@spec read(key(), opts()) ::
  {:ok, binary()}
  | {:error, Linx.Sysctl.Error.t() | {:bad_key, term()} | {:bad_in, term()}}

Reads a sysctl as a trimmed binary.

Returns {:ok, value} where value is the file's contents with trailing whitespace stripped (the kernel always appends a \n).

Options

  • :in:self (default), {:pid, n}, or {:path, p}. Routes the read through the target's namespace stack on a throwaway pthread.

Examples

iex> Linx.Sysctl.read("kernel.ostype")
{:ok, "Linux"}

iex> Linx.Sysctl.read("net.ipv4.ip_forward")
{:ok, "0"}

# Read the value the container sees, not the host's:
iex> Linx.Sysctl.read("net.ipv4.ip_forward", in: {:pid, container_pid})
{:ok, "1"}

Errors

  • {:error, {:bad_key, reason}} — caller-side input mistake.
  • {:error, {:bad_in, reason}} — malformed :in value.
  • {:error, %Linx.Sysctl.Error{}} — kernel-level failure. Common: :enoent (no such sysctl), :eacces (procfs denied the read), or — with :in: {:pid, _}:open_ns / :setns / :unshare / :thread from the namespace-acquisition path.

read_int(key, opts \\ [])

@spec read_int(key(), opts()) ::
  {:ok, integer()}
  | {:error,
     Linx.Sysctl.Error.t()
     | {:bad_key, term()}
     | {:bad_in, term()}
     | {:bad_value, term()}}

Reads a sysctl and parses it as a single integer.

Convenience for the common case (net.ipv4.ip_forward, vm.swappiness, every *_max / *_min knob).

Accepts the same :in option as read/2.

Examples

iex> Linx.Sysctl.read_int("net.ipv4.ip_forward")
{:ok, 0}

iex> Linx.Sysctl.read_int("kernel.hostname")  # not an integer
{:error, {:bad_value, {:not_an_integer, "fry"}}}

read_ints(key, opts \\ [])

@spec read_ints(key(), opts()) ::
  {:ok, [integer()]}
  | {:error,
     Linx.Sysctl.Error.t()
     | {:bad_key, term()}
     | {:bad_in, term()}
     | {:bad_value, term()}}

Reads a sysctl and parses it as a list of integers, split on whitespace.

Convenience for the tuple-shaped knobs: kernel.printk is four ints, net.ipv4.tcp_rmem / tcp_wmem are three each.

Accepts the same :in option as read/2.

Examples

iex> Linx.Sysctl.read_ints("kernel.printk")
{:ok, [4, 4, 1, 7]}

iex> Linx.Sysctl.read_ints("net.ipv4.tcp_rmem")
{:ok, [4096, 131072, 6291456]}

supported?()

@spec supported?() :: boolean()

Returns true iff the kernel exposes a /proc/sys/ tree on this host.

Canonical check: /proc/sys/kernel/ostype exists. The knob has been present since before namespaces existed; on any Linux kernel with procfs mounted at /proc, this is true.

write(key, value, opts \\ [])

@spec write(key(), value(), opts()) ::
  :ok
  | {:error,
     Linx.Sysctl.Error.t()
     | {:bad_key, term()}
     | {:bad_in, term()}
     | {:bad_value, term()}}

Writes a value to a sysctl.

value may be:

  • an integer — rendered with Integer.to_string/1.
  • a binary — written verbatim. Must not contain \n or \0: the kernel's sysctl parser treats newlines as end-of-input and would silently truncate a multi-line string. We reject these before the write so the failure is loud.
  • a list of integers — rendered space-separated. For the tuple-shaped knobs like kernel.printk, net.ipv4.tcp_rmem, net.ipv4.tcp_wmem.

We don't append a trailing \n — the kernel accepts either form.

Options

  • :in:self (default), {:pid, n}, or {:path, p}. With {:pid, _}, the write lands in the target's namespace stack via the same setns dance as read/2.

Examples

iex> Linx.Sysctl.write("net.ipv4.ip_forward", 1)
:ok

iex> Linx.Sysctl.write("kernel.printk", [4, 4, 1, 7])
:ok

# Set the container's hostname without touching the host's.
iex> Linx.Sysctl.write("kernel.hostname", "ct0", in: {:pid, container_pid})
:ok

Errors

  • {:error, {:bad_key, reason}} — malformed key.
  • {:error, {:bad_value, reason}} — bad value shape or content.
  • {:error, {:bad_in, reason}} — malformed :in value.
  • {:error, %Linx.Sysctl.Error{}} — kernel-level failure. Common: :eacces / :eperm (need root), :enoent (no such sysctl), :einval (value out of range / wrong shape).