Linx.Netfilter (Linx v0.1.0)

Copy Markdown View Source

Linux netfilter primitives — modern firewall (nf_tables) via the NETLINK_NETFILTER netlink protocol family, plus live ruleset monitoring and packet-event capture (NFLOG).

Why a separate subsystem

Netfilter is a coherent kernel concept (firewall + connection tracking + packet event streams) with its own netlink protocol family (NETLINK_NETFILTER = 12) and a sprawling but consistent surface. Wrapping it as its own concept module — peer to Linx.Process, Linx.Cgroup, Linx.Mount, Linx.User, Linx.Capabilities, Linx.Seccomp, Linx.Sysctl — keeps the firewall mental model explicit. The underlying transport, Linx.Netlink.Nfnl, mirrors Linx.Netlink.Rtnl's shape.

Value, not handle

%Linx.Netfilter.Ruleset{} is plain data: tables containing chains containing ordered rules, plus sets/maps/vmaps and named objects. Pure Elixir values, freely composable and inspectable. Four verbs:

  • build — construct via pipeline DSL or ~NFT sigil.
  • push/2 — write to the kernel atomically (:replace rebuilds, :reconcile computes the minimal diff).
  • pull/1..2 — read kernel state into a ruleset value.
  • diff/2 — compute the patch between two rulesets.

Kernel state lives in the kernel; the Elixir value is the Elixir value. Mirrors %Linx.Seccomp.Filter{} scaled to a larger surface.

Transactions are mandatory

Every mutation goes through a NFNL_MSG_BATCH_BEGIN / NFNL_MSG_BATCH_END envelope; the kernel applies the whole batch atomically or rejects it whole. push/2 is the only mutator, batch-shaped from the outside in.

Modes:

  • :replace (default) — tear down and rebuild the named tables. Simple, brief disruption.
  • :reconcile — compute the minimal patch between current kernel state and the desired Ruleset, emit as one batch. LiveView-of-firewalls; no service interruption when only adding/removing rules at the margins.

Optimistic concurrency via NFTA_BATCH_GENID

:reconcile mode threads the kernel's generation counter through the batch: "I computed this against generation N; reject if N has moved". The kernel returns ERESTART on mismatch — push/2 retries with bounded attempts, surfacing {:error, %Error{errno: :erestart, ruleset_gen: gen}} on exhaustion. Lets Linx cooperate cleanly with nft CLI / firewalld / any other writer in the same netns.

Owner flag is the default

create_table/2 sets NFT_TABLE_F_OWNER by default: the table is destroyed when the creating netlink socket closes. The supervisor that opens the Nfnl socket owns the firewall; if it dies, rules vanish. No other firewall management tool exposes this naturally.

Opt out with persist: true (uses NFT_TABLE_F_PERSIST, 6.9+) for policies that should survive the BEAM. Older kernels fall back to no-flags, table survives socket close until explicitly deleted.

Per-namespace isolation

Each netns has fully independent nftables state — own tables, own generation counter, own commit mutex, own multicast group. Linx.Netlink.Nfnl.open({:pid, child_pid}) opens the socket inside that netns for its whole life; reads/writes through that socket land in the child's nftables instance. Same value type, same verbs.

Authoring surfaces: peers, not layers

Two authoring surfaces produce the same %Ruleset{}:

  • Pipeline DSLRuleset.new() |> Ruleset.add_table(...) |> Table.add_chain(...) |> Chain.add_rule(...) — for runtime-shaped rulesets (interfaces discovered at boot, IPs from config).
  • ~NFT sigil~NFT"table inet myapp { chain ... }" — for compile-time-authored rulesets with safe Elixir interpolation and lossless round-trip to nftables.conf files. Modelled on Phoenix LiveView's HEEx.

Both call the same validator-setter functions; both produce the same value.

The setters use add_* (add_table / add_chain / add_rule), not the create_* of Linx.Cgroup or Linx.Netlink.Rtnl, deliberately: add_* inserts into a value, while create materialises a kernel object — different acts, different verbs.

Composition with Linx.Process

Same shape as every other Linx subsystem: configure the child's network and firewall at the checkpoint between :ready and proceed/1, then release the workload with everything in force:

{:ok, c} = Linx.Process.spawn(argv: [...], namespaces: [:net])
receive do {:linx_process, :ready, _} -> :ok end
{:ok, host_pid} = Linx.Process.host_pid(c)

{:ok, ct_nfnl} = Linx.Netlink.Nfnl.open({:pid, host_pid})
:ok = Linx.Netfilter.push(ct_nfnl, container_ruleset())

:ok = Linx.Process.proceed(c)

Linx.Process has zero awareness of netfilter; the checkpoint is the only coupling, exactly the way Linx.Sysctl / Linx.Mount / every other subsystem composes.

See docs/netfilter/DESIGN.md for design work intentionally deferred.

References

Summary

Functions

Creates a new table in the kernel's nftables instance.

Computes the minimum-mutation %Linx.Netfilter.Patch{} between two Rulesets — the operations that turn from into to.

Alias for diff/2 — return the patch without sending it. The name reads better at call sites where the intent is "show me what would change".

Opens an NFLOG listener bound to :group. The owner receives {:linx_netfilter, :log, %Linx.Netfilter.Log.Event{}} per logged packet.

Pulls the kernel's nftables state into a Ruleset value.

Scoped pull — fetches one table by (family, name) plus its chains, rules, and sets.

Pushes a Ruleset to the kernel atomically as one batched transaction.

Subscribes owner_pid to multicast nfnetlink events for ruleset changes in the current netns.

Returns true iff the kernel supports nfnetlink (i.e., a NETLINK_NETFILTER socket can be opened in the current netns).

Stops a Log listener returned by log_listen/2. The kernel-side group binding is dropped before the socket is closed.

Unsubscribes by stopping the Monitor returned from subscribe/2.

Functions

create_table(sock, name, opts \\ [])

@spec create_table(Linx.Netlink.Socket.t(), String.t(), keyword()) ::
  {:ok, Linx.Netfilter.Ruleset.t()}
  | {:error, Linx.Netfilter.Error.t() | term()}

Creates a new table in the kernel's nftables instance.

Options

  • :family:ip | :ip6 | :inet | :arp | :bridge | :netdev. Default: :inet (the firewall sweet spot — one table covers both IPv4 and IPv6).

  • :persisttrue to disable the owner flag, leaving the table behind when the socket closes. Default false (table auto-destroys with the socket; see Owner flag is the default in the moduledoc).

Returns {:ok, %Ruleset{}} — the ruleset has just this one table, ready for chains / rules to be added with the Linx.Netfilter.Ruleset pipeline DSL and then pushed back with push/2.

Wire-level failures come back as {:error, %Linx.Netfilter.Error{}} with the operation set to :create_table and the kernel's errno / extended-ack message attached. EEXIST means the table was already present (pass through Ruleset.pull/2 first if you want a "create-or-fetch" pattern).

diff(from, to)

Computes the minimum-mutation %Linx.Netfilter.Patch{} between two Rulesets — the operations that turn from into to.

Identity rules:

  • Tables / chains / sets / maps — name (within the relevant scope: tables within family, the rest within their table).
  • Rules within a chain — :tag when set, positional index otherwise. Mixed-tag chains fall back to a full rebuild.
  • Set elements — the element value itself.

Rule attribute changes use NLM_F_REPLACE over the kernel-assigned handle carried by from's rule (so you must diff against a Ruleset pulled from the kernel, not against a freshly-built one — otherwise handles are nil).

Patches are topologically sorted: deletes before creates of their dependencies (see Linx.Netfilter.Patch).

See Linx.Netfilter.Diff for the underlying implementation.

dry_run(from, to)

Alias for diff/2 — return the patch without sending it. The name reads better at call sites where the intent is "show me what would change".

log_listen(owner_pid \\ self(), opts \\ [])

@spec log_listen(
  pid(),
  keyword()
) :: {:ok, pid()} | {:error, term()}

Opens an NFLOG listener bound to :group. The owner receives {:linx_netfilter, :log, %Linx.Netfilter.Log.Event{}} per logged packet.

Required option:

  • :group — NFLOG group (1..65535) the rule's Linx.Netfilter.Expr.log/1 directs packets to. Linx convention: use 5000 if you don't care which group.

Optional:

  • :netns — namespace; default :host.
  • :copy_mode:none | :meta | :packet | {:packet, snaplen}. Default :meta (header info only, no payload).

  • :qthresh — kernel-side queue threshold; default 1.
  • :timeout_ms — kernel-side batching timeout; default 0 (no time-based batching).
  • :flags[:seq, :seq_global, :conntrack].
  • :families — protocol families to bind; default [:ipv4, :ipv6].
  • :rcvbufSO_RCVBUF bytes; default 4 MiB.

Returns {:ok, listener_pid}. Close with unlog_listen/1.

See Linx.Netfilter.Log for the GenServer's full surface and Linx.Netfilter.Log.Event for the packet-event shape.

pull(sock, opts_or_scope \\ [])

@spec pull(Linx.Netlink.Socket.t(), keyword() | {atom(), String.t()}) ::
  {:ok, Linx.Netfilter.Ruleset.t()}
  | {:error, Linx.Netfilter.Error.t() | term()}

Pulls the kernel's nftables state into a Ruleset value.

No-arg form dumps the entire netns — every table, every chain, every rule the caller can see. Pass a {family, name} tuple to scope the dump to one table (or pull/3 with options).

Options (no-arg form):

  • :subscribe_first — pid of a Linx.Netfilter.Monitor to handshake against. Captures the current gen via GETGEN and tells the monitor to drop events at or below it. Subsequent multicast events with gen_id > captured are guaranteed not to be in the returned snapshot (snapshot+tail pattern).

Implementation: three sequential dumps (GETTABLE, GETCHAIN, GETRULE) plus per-set GETSETELEM, then Decoder.from_msgs/5 assembles them. Dumps are not atomic across types — for full consistency under churn, combine with :subscribe_first and the Monitor.

pull(sock, arg, opts)

@spec pull(Linx.Netlink.Socket.t(), {atom(), String.t()}, keyword()) ::
  {:ok, Linx.Netfilter.Ruleset.t()}
  | {:error, Linx.Netfilter.Error.t() | term()}

Scoped pull — fetches one table by (family, name) plus its chains, rules, and sets.

Accepts the same options as the no-arg pull/2 (currently :subscribe_first).

Returns {:ok, %Ruleset{}} containing just that table, or {:error, %Linx.Netfilter.Error{errno: :enoent}} if the table doesn't exist.

push(sock, ruleset, opts \\ [])

Pushes a Ruleset to the kernel atomically as one batched transaction.

Modes:

  • :replace (default) — for each table in ruleset, the kernel sees DESTROYTABLE (silent-if-missing, 6.3+) then NEWTABLE plus all its chains and rules. Other tables in the netns are untouched.
  • :reconcile — minimal-diff push with NFTA_BATCH_GENID CAS for cooperative concurrency.

Returns :ok on success, or {:error, %Linx.Netfilter.Error{}} carrying the first inner-message rejection (with :batch_seq pointing at the offending message position).

subscribe(owner_pid \\ self(), opts \\ [])

@spec subscribe(
  pid(),
  keyword()
) :: {:ok, pid()} | {:error, term()}

Subscribes owner_pid to multicast nfnetlink events for ruleset changes in the current netns.

Returns {:ok, monitor_pid}. The owner then receives:

  • {:linx_netfilter, :event, %Linx.Netfilter.Event{}} per committed change (one :new_gen followed by one event per mutated entity).
  • {:linx_netfilter, :resync_needed} when the monitor socket overflows (ENOBUFS) — the owner should re-pull state.

Options:

  • :netns — namespace to monitor. Defaults to :host.
  • :since_gen — initial floor; events at or below this gen are dropped. Use in tandem with pull/1..2's :subscribe_first for snapshot+tail.
  • :rcvbuf — multicast socket receive buffer size in bytes; default 4 MiB.

See Linx.Netfilter.Monitor for the GenServer's full surface.

supported?()

@spec supported?() :: boolean()

Returns true iff the kernel supports nfnetlink (i.e., a NETLINK_NETFILTER socket can be opened in the current netns).

Opening the socket verifies the kernel was built with CONFIG_NETFILTER_NETLINK=y (universal in modern Linux) — every real operation against it (GETGEN, mutations) requires CAP_NET_ADMIN, but the socket open itself is unprivileged. So this probe answers "would Linx.Netfilter work if I had the right capabilities", not "do I have the right capabilities" — the latter surfaces as a :eperm error from the actual verb call when the time comes.

Returns false if the kernel module is missing or the BEAM process can't allocate a socket. Doesn't distinguish between those.

unlog_listen(listener)

@spec unlog_listen(pid()) :: :ok

Stops a Log listener returned by log_listen/2. The kernel-side group binding is dropped before the socket is closed.

unsubscribe(monitor)

@spec unsubscribe(pid()) :: :ok

Unsubscribes by stopping the Monitor returned from subscribe/2.