Linux netfilter primitives — modern firewall (nf_tables) via the
NETLINK_NETFILTER netlink protocol family, plus live ruleset
monitoring and packet-event capture (NFLOG).
Why a separate subsystem
Netfilter is a coherent kernel concept (firewall + connection
tracking + packet event streams) with its own netlink protocol
family (NETLINK_NETFILTER = 12) and a sprawling but consistent
surface. Wrapping it as its own concept module — peer to
Linx.Process, Linx.Cgroup, Linx.Mount, Linx.User,
Linx.Capabilities, Linx.Seccomp, Linx.Sysctl — keeps the
firewall mental model explicit. The underlying transport,
Linx.Netlink.Nfnl, mirrors Linx.Netlink.Rtnl's shape.
Value, not handle
%Linx.Netfilter.Ruleset{} is plain data: tables containing chains
containing ordered rules, plus sets/maps/vmaps and named objects.
Pure Elixir values, freely composable and inspectable. Four verbs:
build— construct via pipeline DSL or~NFTsigil.push/2— write to the kernel atomically (:replacerebuilds,:reconcilecomputes the minimal diff).pull/1..2— read kernel state into a ruleset value.diff/2— compute the patch between two rulesets.
Kernel state lives in the kernel; the Elixir value is the Elixir
value. Mirrors %Linx.Seccomp.Filter{} scaled to a larger surface.
Transactions are mandatory
Every mutation goes through a NFNL_MSG_BATCH_BEGIN /
NFNL_MSG_BATCH_END envelope; the kernel applies the whole batch
atomically or rejects it whole. push/2 is the only mutator,
batch-shaped from the outside in.
Modes:
:replace(default) — tear down and rebuild the named tables. Simple, brief disruption.:reconcile— compute the minimal patch between current kernel state and the desired Ruleset, emit as one batch. LiveView-of-firewalls; no service interruption when only adding/removing rules at the margins.
Optimistic concurrency via NFTA_BATCH_GENID
:reconcile mode threads the kernel's generation counter through
the batch: "I computed this against generation N; reject if N has
moved". The kernel returns ERESTART on mismatch — push/2
retries with bounded attempts, surfacing
{:error, %Error{errno: :erestart, ruleset_gen: gen}} on
exhaustion. Lets Linx cooperate cleanly with nft CLI / firewalld /
any other writer in the same netns.
Owner flag is the default
create_table/2 sets NFT_TABLE_F_OWNER by default: the table is
destroyed when the creating netlink socket closes. The supervisor
that opens the Nfnl socket owns the firewall; if it dies, rules
vanish. No other firewall management tool exposes this naturally.
Opt out with persist: true (uses NFT_TABLE_F_PERSIST, 6.9+) for
policies that should survive the BEAM. Older kernels fall back to
no-flags, table survives socket close until explicitly deleted.
Per-namespace isolation
Each netns has fully independent nftables state — own tables, own
generation counter, own commit mutex, own multicast group.
Linx.Netlink.Nfnl.open({:pid, child_pid}) opens the socket inside
that netns for its whole life; reads/writes through that socket
land in the child's nftables instance. Same value type, same
verbs.
Authoring surfaces: peers, not layers
Two authoring surfaces produce the same %Ruleset{}:
- Pipeline DSL —
Ruleset.new() |> Ruleset.add_table(...) |> Table.add_chain(...) |> Chain.add_rule(...)— for runtime-shaped rulesets (interfaces discovered at boot, IPs from config). ~NFTsigil —~NFT"table inet myapp { chain ... }"— for compile-time-authored rulesets with safe Elixir interpolation and lossless round-trip tonftables.conffiles. Modelled on Phoenix LiveView's HEEx.
Both call the same validator-setter functions; both produce the same value.
The setters use add_* (add_table / add_chain / add_rule), not
the create_* of Linx.Cgroup or Linx.Netlink.Rtnl, deliberately:
add_* inserts into a value, while create materialises a kernel
object — different acts, different verbs.
Composition with Linx.Process
Same shape as every other Linx subsystem: configure the child's
network and firewall at the checkpoint between :ready and
proceed/1, then release the workload with everything in force:
{:ok, c} = Linx.Process.spawn(argv: [...], namespaces: [:net])
receive do {:linx_process, :ready, _} -> :ok end
{:ok, host_pid} = Linx.Process.host_pid(c)
{:ok, ct_nfnl} = Linx.Netlink.Nfnl.open({:pid, host_pid})
:ok = Linx.Netfilter.push(ct_nfnl, container_ruleset())
:ok = Linx.Process.proceed(c)Linx.Process has zero awareness of netfilter; the checkpoint is
the only coupling, exactly the way Linx.Sysctl / Linx.Mount /
every other subsystem composes.
See docs/netfilter/DESIGN.md for design work intentionally deferred.
References
Summary
Functions
Creates a new table in the kernel's nftables instance.
Computes the minimum-mutation %Linx.Netfilter.Patch{} between
two Rulesets — the operations that turn from into to.
Alias for diff/2 — return the patch without sending it. The
name reads better at call sites where the intent is "show me
what would change".
Opens an NFLOG listener bound to :group. The owner receives
{:linx_netfilter, :log, %Linx.Netfilter.Log.Event{}} per
logged packet.
Pulls the kernel's nftables state into a Ruleset value.
Scoped pull — fetches one table by (family, name) plus its
chains, rules, and sets.
Pushes a Ruleset to the kernel atomically as one batched transaction.
Subscribes owner_pid to multicast nfnetlink events for ruleset
changes in the current netns.
Returns true iff the kernel supports nfnetlink (i.e., a
NETLINK_NETFILTER socket can be opened in the current netns).
Stops a Log listener returned by log_listen/2. The kernel-side
group binding is dropped before the socket is closed.
Unsubscribes by stopping the Monitor returned from subscribe/2.
Functions
@spec create_table(Linx.Netlink.Socket.t(), String.t(), keyword()) :: {:ok, Linx.Netfilter.Ruleset.t()} | {:error, Linx.Netfilter.Error.t() | term()}
Creates a new table in the kernel's nftables instance.
Options
:family—:ip|:ip6|:inet|:arp|:bridge|:netdev. Default::inet(the firewall sweet spot — one table covers both IPv4 and IPv6).:persist—trueto disable the owner flag, leaving the table behind when the socket closes. Defaultfalse(table auto-destroys with the socket; see Owner flag is the default in the moduledoc).
Returns {:ok, %Ruleset{}} — the ruleset has just this one
table, ready for chains / rules to be added with the
Linx.Netfilter.Ruleset pipeline DSL and then pushed back with
push/2.
Wire-level failures come back as {:error, %Linx.Netfilter.Error{}}
with the operation set to :create_table and the kernel's
errno / extended-ack message attached. EEXIST means the table
was already present (pass through Ruleset.pull/2 first if you
want a "create-or-fetch" pattern).
@spec diff(Linx.Netfilter.Ruleset.t(), Linx.Netfilter.Ruleset.t()) :: Linx.Netfilter.Patch.t()
Computes the minimum-mutation %Linx.Netfilter.Patch{} between
two Rulesets — the operations that turn from into to.
Identity rules:
- Tables / chains / sets / maps —
name(within the relevant scope: tables within family, the rest within their table). - Rules within a chain —
:tagwhen set, positional index otherwise. Mixed-tag chains fall back to a full rebuild. - Set elements — the element value itself.
Rule attribute changes use NLM_F_REPLACE over the
kernel-assigned handle carried by from's rule (so you must
diff against a Ruleset pulled from the kernel, not against a
freshly-built one — otherwise handles are nil).
Patches are topologically sorted: deletes before creates of
their dependencies (see Linx.Netfilter.Patch).
See Linx.Netfilter.Diff for the underlying implementation.
@spec dry_run(Linx.Netfilter.Ruleset.t(), Linx.Netfilter.Ruleset.t()) :: Linx.Netfilter.Patch.t()
Alias for diff/2 — return the patch without sending it. The
name reads better at call sites where the intent is "show me
what would change".
Opens an NFLOG listener bound to :group. The owner receives
{:linx_netfilter, :log, %Linx.Netfilter.Log.Event{}} per
logged packet.
Required option:
:group— NFLOG group (1..65535) the rule'sLinx.Netfilter.Expr.log/1directs packets to. Linx convention: use5000if you don't care which group.
Optional:
:netns— namespace; default:host.:copy_mode—:none|:meta|:packet|{:packet, snaplen}. Default:meta(header info only, no payload).:qthresh— kernel-side queue threshold; default1.:timeout_ms— kernel-side batching timeout; default0(no time-based batching).:flags—[:seq, :seq_global, :conntrack].:families— protocol families to bind; default[:ipv4, :ipv6].:rcvbuf—SO_RCVBUFbytes; default 4 MiB.
Returns {:ok, listener_pid}. Close with unlog_listen/1.
See Linx.Netfilter.Log for the GenServer's full surface and
Linx.Netfilter.Log.Event for the packet-event shape.
@spec pull(Linx.Netlink.Socket.t(), keyword() | {atom(), String.t()}) :: {:ok, Linx.Netfilter.Ruleset.t()} | {:error, Linx.Netfilter.Error.t() | term()}
Pulls the kernel's nftables state into a Ruleset value.
No-arg form dumps the entire netns — every table, every chain,
every rule the caller can see. Pass a {family, name} tuple to
scope the dump to one table (or pull/3 with options).
Options (no-arg form):
:subscribe_first— pid of aLinx.Netfilter.Monitorto handshake against. Captures the current gen viaGETGENand tells the monitor to drop events at or below it. Subsequent multicast events withgen_id > capturedare guaranteed not to be in the returned snapshot (snapshot+tail pattern).
Implementation: three sequential dumps (GETTABLE, GETCHAIN,
GETRULE) plus per-set GETSETELEM, then Decoder.from_msgs/5
assembles them. Dumps are not atomic across types — for full
consistency under churn, combine with :subscribe_first and the
Monitor.
@spec pull(Linx.Netlink.Socket.t(), {atom(), String.t()}, keyword()) :: {:ok, Linx.Netfilter.Ruleset.t()} | {:error, Linx.Netfilter.Error.t() | term()}
Scoped pull — fetches one table by (family, name) plus its
chains, rules, and sets.
Accepts the same options as the no-arg pull/2 (currently
:subscribe_first).
Returns {:ok, %Ruleset{}} containing just that table, or
{:error, %Linx.Netfilter.Error{errno: :enoent}} if the table
doesn't exist.
@spec push(Linx.Netlink.Socket.t(), Linx.Netfilter.Ruleset.t(), keyword()) :: :ok | {:error, Linx.Netfilter.Error.t() | term()}
Pushes a Ruleset to the kernel atomically as one batched transaction.
Modes:
:replace(default) — for each table inruleset, the kernel seesDESTROYTABLE(silent-if-missing, 6.3+) thenNEWTABLEplus all its chains and rules. Other tables in the netns are untouched.:reconcile— minimal-diff push withNFTA_BATCH_GENIDCAS for cooperative concurrency.
Returns :ok on success, or {:error, %Linx.Netfilter.Error{}}
carrying the first inner-message rejection (with :batch_seq
pointing at the offending message position).
Subscribes owner_pid to multicast nfnetlink events for ruleset
changes in the current netns.
Returns {:ok, monitor_pid}. The owner then receives:
{:linx_netfilter, :event, %Linx.Netfilter.Event{}}per committed change (one:new_genfollowed by one event per mutated entity).{:linx_netfilter, :resync_needed}when the monitor socket overflows (ENOBUFS) — the owner should re-pull state.
Options:
:netns— namespace to monitor. Defaults to:host.:since_gen— initial floor; events at or below this gen are dropped. Use in tandem withpull/1..2's:subscribe_firstfor snapshot+tail.:rcvbuf— multicast socket receive buffer size in bytes; default 4 MiB.
See Linx.Netfilter.Monitor for the GenServer's full surface.
@spec supported?() :: boolean()
Returns true iff the kernel supports nfnetlink (i.e., a
NETLINK_NETFILTER socket can be opened in the current netns).
Opening the socket verifies the kernel was built with
CONFIG_NETFILTER_NETLINK=y (universal in modern Linux) — every
real operation against it (GETGEN, mutations) requires
CAP_NET_ADMIN, but the socket open itself is unprivileged. So
this probe answers "would Linx.Netfilter work if I had the right
capabilities", not "do I have the right capabilities" — the latter
surfaces as a :eperm error from the actual verb call when the
time comes.
Returns false if the kernel module is missing or the BEAM
process can't allocate a socket. Doesn't distinguish between
those.
@spec unlog_listen(pid()) :: :ok
Stops a Log listener returned by log_listen/2. The kernel-side
group binding is dropped before the socket is closed.
@spec unsubscribe(pid()) :: :ok
Unsubscribes by stopping the Monitor returned from subscribe/2.