DynamicSrv.Epmd (libcluster_dynamic_srv v1.0.0)
View SourceThis module was inspired heavily by Caravan
Custom EPMD replacement used to run Erlang distribution with dynamic ports and without a local epmd daemon. If you run this with a service mesh like Consul, coupled with a mutual TLS (Consul Connect), you can use this module to discover peers via DNS SRV records instead of a well‑known static port published through epmd and have service to service TLS encryption.
This module is passed to the Erlang VM through the -epmd_module flag (or the :kernel, :epmd_module application environment) and implements just enough of the epmd client contract that the distribution layer (net_kernel / erl_distribution) can:
- announce the local node's listening port (picked externally and exported via ERL_DIST_PORT), and
- resolve remote nodes to {ip, port, version} tuples by consulting DNS, specifically SRV records to obtain the port.
High level behavior
- Registration: register_node/2,3 does NOT talk to a daemon. It simply returns a pseudo "creation" (1..3) as expected by the runtime.
- Outbound connections: address_please/3 converts an incoming (Name, Host)
pair into "Name.Host", performs DNS lookups:
- A/AAAA (currently :inet.getaddr with :inet, i.e. IPv4 only)
- SRV (via :inet_res.lookup) to obtain the distribution port and returns the distribution version (hard‑coded 5).
- Local listen port: listen_port_please/2 fetches the port from ERL_DIST_PORT (except for special ephemeral "rpc-" / "rem-" prefixed helper nodes, which are answered with port 0).
- names/1 is intentionally unsupported and returns {:error, :address} because there is no central registry.
Node naming convention
The real Erlang node name is still of the form name@host. For matching purposes we map the current node (name@host) to "name.host" and compare it with the requested "Name.Host" (including optional prefixes): (rpc|rem)-<base>-<...>.service.consul This allows short‑lived, prefixed nodes (e.g. rpc-* for remote procedure helper processes) to co‑exist without requiring dedicated ports.
Environment contract
ERL_DIST_PORT MUST be set before the VM starts distribution. Failure to do so raises at runtime when local_dist_port/0 is invoked.
DNS / Consul expectations
For a target like: mynode.myservice.service.consul
- A (or CNAME) record resolves to the host IP.
- SRV record (<my-service>.service.<consul-domain> - however you configure Consul) must return the distribution port you want peers to dial. (This code presently just takes the first returned SRV entry.) Adjust or wrap get_remote_ip_and_port/1 if you need prioritization or IPv6.
Distribution version
Kept at 5 (unchanged since OTP R6). Change only if upstream protocol conventions evolve.
Limitations / Caveats
- No IPv6: :inet.getaddr(..., :inet) restricts lookups to IPv4.
- No fallback / retries: DNS lookups are performed once; transient failures will bubble up as distribution connection errors.
- Assumes SRV availability: If no SRV record exists the pattern match will fail. Wrap lookup logic if you need graceful degradation.
- names/1 unsupported: Tools expecting epmd name listings will not work.
- Single ERL_DIST_PORT: You are responsible for ensuring uniqueness (e.g. by provisioning ports or injecting them via orchestration).
Configuration example (Elixir)
Before launching the VM:
NOTE: The Elixir. prefix is required for specifying the EPMD module.
export ELIXIR_ERL_OPTIONS="-start_epmd false -epmd_module Elixir.DynamicSrv.Epmd"
export ERL_DIST_PORT=<some-port>
export RELEASE_DISTRIBUTION=name - longnames are required
export RELEASE_NODE="node_a@<myservice>.service.<consul domain>"
If you want to see a Nomad/Consul example using Consul Connect for mutual TLS see examples/app.nomad
Troubleshooting
- "ERL_DIST_PORT is not set": Ensure the environment variable is exported in the same shell / container context.
- Cannot resolve remote node: Verify DNS (dig A / SRV records). Make sure the node name you pass maps to the expected SRV record.
- Connection refused: Confirm the remote BEAM VM is listening on the port published by the SRV record and that network policies allow traffic.
- Hanging node connections: Use :inet_res.getbyname / :inet_res.lookup in a shell to confirm that the BEAM's resolver can see the records inside the runtime environment.
- If you are using Consul Connect, ensure that you have a permissive Intention in Consul for the service.
Summary
Functions
See: https://www.erlang.org/doc/apps/kernel/erl_epmd.html#address_please/3
This is using optimized version of this function that also returns the port and version. This will ensure that we don't need to also call port_please/3.
See: https://www.erlang.org/doc/apps/kernel/erl_epmd.html#names/1
We are not implementing this because we are not running epmd.
See: https://www.erlang.org/doc/apps/kernel/erl_epmd.html#register_node/3
As of Erlang/OTP 19.1, register_node/3 is used instead of register_node/2, passing along the address family, 'inet_tcp' or 'inet6_tcp'. This makes no difference for our purposes.
erl_distribution wants us to start a worker process. We don't need one, though.
Returns :ignore