How to load-balance outbound requests

View Source

The balance layer spreads your outbound calls across a pool of endpoints, leans away from a slow replica, and stops sending to a dead one until it recovers. You need it when the service you call runs as several replicas and you want the client to do the balancing, with no separate proxy in front.

Add a balance layer

Instead of a base_url, give the client a pool of endpoints and pass paths. The layer picks an endpoint per request and supplies the host.

Client = livery_client:new(#{
    stack => [
        livery_client:retry(#{max => 3}),
        livery_client:balance(#{
            name      => users,
            endpoints => [
                <<"http://10.0.0.1:8080">>,
                <<"http://10.0.0.2:8080">>,
                <<"http://10.0.0.3:8080">>
            ]
        })
    ]
}),
{ok, Resp} = livery_client:get(Client, <<"/users/42">>),
200 = livery_client:status(Resp).

By default the layer uses power-of-two-choices: it samples two endpoints and sends to the one with fewer in-flight requests, which resists piling onto a slow node. Pass policy => round_robin for plain rotation.

Tune ejection and recovery

The balancer watches outcomes. An endpoint that fails eject_after times in a row (default 5) is ejected from the pool for eject_for ms (default 10000). A failure is any {error, _} or, by default, any response with status >= 500, so a replica answering 503 is treated as unhealthy even though the call technically returned. Override what counts with fail_status:

livery_client:balance(#{
    name        => users,
    endpoints   => Endpoints,
    eject_after => 3,
    eject_for   => 5000,
    fail_status => [500, 502, 503, 504]
}).

Recovery is lazy and safe: once the cooldown passes, the next request is leased as a single probe (an atomic compare-and-swap means only one caller probes, even under load). If it succeeds the endpoint rejoins; if it fails the endpoint stays out for another cooldown. Stack retry above balance, as shown above, and that one probe failure is retried onto a healthy endpoint, invisibly to the caller.

If every endpoint is ejected, a call returns {error, no_endpoint}.

Change the pool at runtime

When a deploy adds a replica, or a node drains, adjust the live pool without rebuilding the client:

ok = livery_client:add_endpoint(users, <<"http://10.0.0.4:8080">>),
ok = livery_client:remove_endpoint(users, <<"http://10.0.0.1:8080">>).

The pool is identified by its name, so every client built with the same name shares it. The endpoints list seeds the pool once, on first use; after that your add/remove calls are authoritative and a later request will not bring a removed endpoint back.

Resolve endpoints from discovery

endpoints can be a {Module, Arg} pair naming a livery_client_discover provider instead of a fixed list. The shipped provider is static; a custom one can resolve endpoints from DNS or a registry:

livery_client:balance(#{name => users, endpoints => {my_discovery, prod}}).

See also