nquic_socket (nquic v1.0.0)

View Source

UDP socket abstraction using OTP socket module.

Provides a high-level interface for UDP socket operations optimized for QUIC. Uses completion-based async I/O for efficient packet reception.

Example Usage

{ok, Socket} = nquic_socket:open(4433, #{}),

{select, SelectInfo} = nquic_socket:recv_start(Socket),

handle_info({'$socket', Socket, select, _Info}, State) ->
    case nquic_socket:recv_now(Socket) of
        {ok, {Source, Data}} ->
            {noreply, State};
        {select, NewSelectInfo} ->
            {noreply, State#state{select_info = NewSelectInfo}}
    end.

Summary

Functions

Probe the running kernel for UDP_SEGMENT (GSO) and UDP_GRO support. Result is cached in persistent_term; the probe runs at most once per node. On non-Linux platforms (or older kernels missing one or both features) the corresponding capability comes back false. Callers can treat the result as opaque and pass gso => true / gro => true only when the matching capability is set; the open path silently no-ops if the kernel rejects the setsockopt.

Close the socket.

Transfer socket ownership to another process.

Extract the ECN codepoint from recvmsg control messages. Returns not_ect (0), ect1 (1), ect0 (2), or ce (3).

Extract the GRO segment size from a recvmsg control-message list. Returns the segment size in bytes when the kernel coalesced the recv, undefined otherwise. The 16-bit segment-size value sits in the lower two bytes of the cmsg payload, which the kernel pads to a 4-byte multiple, so the trailing bytes are matched as a wildcard.

Create a sockaddr from IP and port.

Open a UDP socket on an ephemeral port.

Open a UDP socket on a specific port.

Open a connected UDP socket bound to the same port as the listener. Creates a new UDP socket on ListenerPort, then calls socket:connect/2 to bind it to Peer. The kernel will route datagrams from Peer to this socket (higher priority than the unconnected listener socket). This enables direct recv on the connection owner process, bypassing the receiver dispatch. GRO is left off here; callers that expect bursty replies enable it adaptively via set_gro/2 once the reply pattern warrants it (a coalesced datagram is then split per get_gso_size_from_cmsg/1).

Open an ephemeral connected UDP socket for server-side per-conn FDs. Binds to a kernel-chosen local port (no SO_REUSEPORT) on the same family as the peer, then connect(2)s the socket to Peer. The kernel then delivers any datagram whose 4-tuple matches (peer addr/port, local addr/port) directly to this socket, bypassing the listener's SO_REUSEPORT group entirely. Used post-handshake to migrate a server connection off the shared listener FD onto its own 4-tuple (RFC 9000 §9). Opts lets the caller inherit socket-level features (ECN, GSO, GRO, rcvbuf, sndbuf) from the listener configuration; the function forces reuseaddr => false and reuseport => false so the new socket owns its 4-tuple exclusively.

Get the local port number of the socket.

Rebind a socket to a new local address for connection migration. Opens a new socket on the new address, closes the old one, returns the new socket.

Cancel a pending async receive.

Receive data with ancillary data without blocking. Same as recv_msg_start/1.

Start async receive with ancillary data (for ECN marks). Uses socket:recvmsg to receive control messages alongside the packet data. When IP_RECVTOS is set (via set_ecn/2), the control messages include the TOS byte. Use get_ecn_from_cmsg/1 to extract the ECN codepoint.

Receive data without blocking. Call this after receiving a select message. Returns

Start async receive. Returns {select, Info} when waiting for data. After calling this, the process will receive a message of the form: {'$socket', Socket, select, SelectInfo} when data is available. Then call recv_now/1 to get the actual data.

Send data to a destination address.

Send data on a connected socket (no destination needed).

Send data on a connected socket with a specific ECN codepoint. Uses sendmsg with an IP_TOS / IPV6_TCLASS cmsg. Slower than send_connected/2 because of the extra cmsg processing, so the hot path relies on socket-level TOS pre-stamping (see set_ecn/2) and only falls back to this primitive when per-packet control is required. Future work (GSO batching, pacing) is the expected caller. iolist_to_iovec/1 flattens the iolist into a list of binaries without materialising a single concatenated binary, preserving the zero-copy property of the encrypt path.

Send data with a specific ECN codepoint using sendmsg. See send_connected_with_ecn/3.

Enable ECN on a socket. Configures both directions

Flip the socket-level egress ECN mark. Use after a path validation failure (RFC 9000 §13.4.2.1) to stop emitting ECT-marked packets on this path. Best-effort: errors from the non-matching family are ignored.

Enable UDPGRO on a socket. Once GRO is on, the kernel coalesces consecutive equal-size datagrams of the same flow into a single buffer; socket:recvmsg/5 then returns a control message with `#{level => udp, type => 104, data => <<Size:16/native, /binary>>}that the caller must use to split the buffer back into per-packet chunks. Seeget_gso_size_from_cmsg/1`.

Configure sticky UDP_SEGMENT (GSO) on a socket. After this call, any socket:send/sendto whose payload exceeds Size will be split by the kernel into segments of Size bytes each (the final segment may be shorter). Size = 0 disables segmentation. Returns ok even when the kernel rejects the option, mirroring set_ecn/2's best-effort policy: callers who care must check capabilities/0 first. Pair with set_gro/2 on the peer's receive socket: without GRO, the coalesced segments arrive as individual datagrams that overrun the UDP receive buffer on loopback / fast paths, causing 2-3x retransmissions and a net throughput regression versus the un-offloaded send path.

Convert a sockaddr to {IP, Port} tuple for compatibility.

Get the local address of the socket.

Types

capabilities()

-type capabilities() :: #{gso := boolean(), gro := boolean()}.

ecn_mark()

-type ecn_mark() :: not_ect | ect0 | ect1 | ce.

open_opts()

-type open_opts() ::
          #{port => inet:port_number(),
            ip => inet:ip_address() | any,
            recbuf => pos_integer(),
            sndbuf => pos_integer(),
            reuseaddr => boolean(),
            reuseport => boolean(),
            ipv6_v6only => boolean(),
            ecn => boolean(),
            gso => boolean() | pos_integer(),
            gro => boolean()}.

select_info()

-type select_info() :: socket:select_info().

sockaddr()

-type sockaddr() :: socket:sockaddr_in() | socket:sockaddr_in6().

t()

-type t() :: socket:socket().

Functions

capabilities()

-spec capabilities() -> capabilities().

Probe the running kernel for UDP_SEGMENT (GSO) and UDP_GRO support. Result is cached in persistent_term; the probe runs at most once per node. On non-Linux platforms (or older kernels missing one or both features) the corresponding capability comes back false. Callers can treat the result as opaque and pass gso => true / gro => true only when the matching capability is set; the open path silently no-ops if the kernel rejects the setsockopt.

close(Socket)

-spec close(t()) -> ok | {error, nquic_error:any_reason()}.

Close the socket.

controlling_process(Socket, Pid)

-spec controlling_process(t(), pid()) -> ok | {error, nquic_error:any_reason()}.

Transfer socket ownership to another process.

get_ecn_from_cmsg/1

-spec get_ecn_from_cmsg(list() | undefined) -> not_ect | ect0 | ect1 | ce.

Extract the ECN codepoint from recvmsg control messages. Returns not_ect (0), ect1 (1), ect0 (2), or ce (3).

get_gso_size_from_cmsg/1

-spec get_gso_size_from_cmsg(list() | undefined) -> undefined | pos_integer().

Extract the GRO segment size from a recvmsg control-message list. Returns the segment size in bytes when the kernel coalesced the recv, undefined otherwise. The 16-bit segment-size value sits in the lower two bytes of the cmsg payload, which the kernel pads to a 4-byte multiple, so the trailing bytes are matched as a wildcard.

make_sockaddr/2

-spec make_sockaddr(inet:ip_address(), inet:port_number()) -> sockaddr().

Create a sockaddr from IP and port.

open(Opts)

-spec open(open_opts()) -> {ok, t()} | {error, nquic_error:any_reason()}.

Open a UDP socket on an ephemeral port.

open(Port, Opts)

-spec open(inet:port_number(), open_opts()) -> {ok, t()} | {error, nquic_error:any_reason()}.

Open a UDP socket on a specific port.

open_connected(ListenerPort, Peer)

-spec open_connected(inet:port_number(), sockaddr()) -> {ok, t()} | {error, nquic_error:any_reason()}.

Open a connected UDP socket bound to the same port as the listener. Creates a new UDP socket on ListenerPort, then calls socket:connect/2 to bind it to Peer. The kernel will route datagrams from Peer to this socket (higher priority than the unconnected listener socket). This enables direct recv on the connection owner process, bypassing the receiver dispatch. GRO is left off here; callers that expect bursty replies enable it adaptively via set_gro/2 once the reply pattern warrants it (a coalesced datagram is then split per get_gso_size_from_cmsg/1).

open_ephemeral/2

-spec open_ephemeral(sockaddr(), open_opts()) -> {ok, t()} | {error, nquic_error:any_reason()}.

Open an ephemeral connected UDP socket for server-side per-conn FDs. Binds to a kernel-chosen local port (no SO_REUSEPORT) on the same family as the peer, then connect(2)s the socket to Peer. The kernel then delivers any datagram whose 4-tuple matches (peer addr/port, local addr/port) directly to this socket, bypassing the listener's SO_REUSEPORT group entirely. Used post-handshake to migrate a server connection off the shared listener FD onto its own 4-tuple (RFC 9000 §9). Opts lets the caller inherit socket-level features (ECN, GSO, GRO, rcvbuf, sndbuf) from the listener configuration; the function forces reuseaddr => false and reuseport => false so the new socket owns its 4-tuple exclusively.

port(Socket)

-spec port(t()) -> {ok, inet:port_number()} | {error, nquic_error:any_reason()}.

Get the local port number of the socket.

rebind(OldSocket, NewAddr)

-spec rebind(t(), sockaddr()) -> {ok, t()} | {error, nquic_error:any_reason()}.

Rebind a socket to a new local address for connection migration. Opens a new socket on the new address, closes the old one, returns the new socket.

recv_cancel(Socket, SelectInfo)

-spec recv_cancel(t(), select_info()) -> ok | {error, nquic_error:any_reason()}.

Cancel a pending async receive.

recv_msg_now(Socket)

-spec recv_msg_now(t()) ->
                      {ok, {sockaddr(), binary(), list()}} |
                      {select, select_info()} |
                      {error, nquic_error:any_reason()}.

Receive data with ancillary data without blocking. Same as recv_msg_start/1.

recv_msg_start(Socket)

-spec recv_msg_start(t()) ->
                        {ok, {sockaddr(), binary(), list()}} |
                        {select, select_info()} |
                        {error, nquic_error:any_reason()}.

Start async receive with ancillary data (for ECN marks). Uses socket:recvmsg to receive control messages alongside the packet data. When IP_RECVTOS is set (via set_ecn/2), the control messages include the TOS byte. Use get_ecn_from_cmsg/1 to extract the ECN codepoint.

recv_now(Socket)

-spec recv_now(t()) ->
                  {ok, {sockaddr(), binary()}} |
                  {select, select_info()} |
                  {select_read, {select_info(), {sockaddr(), binary()}}} |
                  {completion, socket:completion_info()} |
                  {error, nquic_error:any_reason()}.

Receive data without blocking. Call this after receiving a select message. Returns:

  • {ok, {Source, Data}} - Packet received
  • {select, SelectInfo} - No data ready, wait for next select message
  • {error, Reason} - Error occurred

recv_start(Socket)

-spec recv_start(t()) ->
                    {ok, {sockaddr(), binary()}} |
                    {select, select_info()} |
                    {select_read, {select_info(), {sockaddr(), binary()}}} |
                    {completion, socket:completion_info()} |
                    {error, nquic_error:any_reason()}.

Start async receive. Returns {select, Info} when waiting for data. After calling this, the process will receive a message of the form: {'$socket', Socket, select, SelectInfo} when data is available. Then call recv_now/1 to get the actual data.

send(Socket, Dest, Data)

-spec send(t(), sockaddr(), iodata()) -> ok | {error, nquic_error:any_reason()}.

Send data to a destination address.

send_connected(Socket, Data)

-spec send_connected(t(), iodata()) -> ok | {error, nquic_error:any_reason()}.

Send data on a connected socket (no destination needed).

send_connected_with_ecn(Socket, Data, ECN)

-spec send_connected_with_ecn(t(), iodata(), ecn_mark()) -> ok | {error, nquic_error:any_reason()}.

Send data on a connected socket with a specific ECN codepoint. Uses sendmsg with an IP_TOS / IPV6_TCLASS cmsg. Slower than send_connected/2 because of the extra cmsg processing, so the hot path relies on socket-level TOS pre-stamping (see set_ecn/2) and only falls back to this primitive when per-packet control is required. Future work (GSO batching, pacing) is the expected caller. iolist_to_iovec/1 flattens the iolist into a list of binaries without materialising a single concatenated binary, preserving the zero-copy property of the encrypt path.

send_with_ecn(Socket, Dest, Data, ECN)

-spec send_with_ecn(t(), sockaddr(), iodata(), ecn_mark()) -> ok | {error, nquic_error:any_reason()}.

Send data with a specific ECN codepoint using sendmsg. See send_connected_with_ecn/3.

set_ecn/2

-spec set_ecn(t(), boolean()) -> ok | {error, nquic_error:any_reason()}.

Enable ECN on a socket. Configures both directions:

  • Inbound: IP_RECVTOS / IPV6_RECVTCLASS so recvmsg returns the TOS / traffic-class byte in ancillary data. The receiver decodes it via get_ecn_from_cmsg/1 and feeds the per-packet ECN counts into the protocol layer.
  • Outbound: IP_TOS = 2 / IPV6_TCLASS = 2 so the kernel stamps every outgoing datagram as ECT(0) without per-packet sendmsg overhead. Validation failure flips this back to 0 via set_egress_ecn/2. Errors from setopt are tolerated per-option when the family is not present (a v4-only socket cannot accept ipv6/* and vice versa).

set_egress_ecn(Socket, Mark)

-spec set_egress_ecn(t(), ecn_mark()) -> ok.

Flip the socket-level egress ECN mark. Use after a path validation failure (RFC 9000 §13.4.2.1) to stop emitting ECT-marked packets on this path. Best-effort: errors from the non-matching family are ignored.

set_gro/2

-spec set_gro(t(), boolean()) -> ok.

Enable UDPGRO on a socket. Once GRO is on, the kernel coalesces consecutive equal-size datagrams of the same flow into a single buffer; socket:recvmsg/5 then returns a control message with `#{level => udp, type => 104, data => <<Size:16/native, /binary>>}that the caller must use to split the buffer back into per-packet chunks. Seeget_gso_size_from_cmsg/1`.

set_gso_size(Socket, Size)

-spec set_gso_size(t(), non_neg_integer()) -> ok.

Configure sticky UDP_SEGMENT (GSO) on a socket. After this call, any socket:send/sendto whose payload exceeds Size will be split by the kernel into segments of Size bytes each (the final segment may be shorter). Size = 0 disables segmentation. Returns ok even when the kernel rejects the option, mirroring set_ecn/2's best-effort policy: callers who care must check capabilities/0 first. Pair with set_gro/2 on the peer's receive socket: without GRO, the coalesced segments arrive as individual datagrams that overrun the UDP receive buffer on loopback / fast paths, causing 2-3x retransmissions and a net throughput regression versus the un-offloaded send path.

sockaddr_to_tuple/1

-spec sockaddr_to_tuple(sockaddr()) -> {inet:ip_address(), inet:port_number()}.

Convert a sockaddr to {IP, Port} tuple for compatibility.

sockname(Socket)

-spec sockname(t()) -> {ok, sockaddr()} | {error, nquic_error:any_reason()}.

Get the local address of the socket.