socket_drano v0.3.0 SocketDrano View Source

Process to gracefully drain Phoenix Socket connections at shutdown.

Plug.Cowboy.Drainer is able to handle draining of connections as they complete during a shutdown. Websockets, however, are long-lived connections which may not complete before timeout periods are reached, especially if they have a heartbeat to keep alive. This library handles both in a single dep to keep things simple.

In order to gracefully shed Phoenix Sockets, it's necessary to explicitly close them. This is useful in scenarios where a container receives a sigterm during a deployment or scaling down and you want to avoid a thundering herd on your other containers.

This module provides a process that during shutdown will initiate shutdown of open Phoenix sockets. When the client receives the disconnect message, it will attempt to reconnect, so this is most effective when used in combination with your load balancer removing the container from the available pool.

On start_draining, SocketDrano will spawn non-blocking processes to disconnect monitored socket connections and exit promptly to allow to do its thing.

Note: This library does not solve the issue of rebalancing of sockets. That's a tougher issue and highly dependent on your load-balancing strategy and infra.

Socket connections are discovered via telemetry events and monitored for socket closing.

Important: This library currently leverages an undocumented internal function in Phoenix to achieve its magic of closing local sockets. I may attempt to get this functionality explicitly exposed in Phoenix APIs. This could cease working if that internal function were to change, but it should not break.

Usage

If you run into issues in your test or development environment, you can set the shutdown_delay to a low value, such as 0 in non-production environments.

Options

The following options can be given to the child spec:

  • :refs - A list of refs to drain. :all is also supported and will drain all cowboy listeners, including those started by means other than Plug.Cowboy. Required

  • :shutdown_delay - How long to wait for connections to drain. This number should not exceed the max time before a sigkill is sent by your container orchestration settings. Defaults to 5000ms.

  • :drain_check_interval - How frequently ranch should check for all connections to have drained.

  • :strategy - Strategy to drain the sockets. The percentage and time should resolve to 100% of connections being drained in a time less than the shutdown time. Defaults to {:percentage, 25, 100}.

Examples

# In your application
def start(_type, _args) do
  children = [
    MyApp.Endpoint,
    {SocketDrano, refs: [MyApp.Endpoint.HTTP]}
  ]

  opts = [strategy: :one_for_one, name: MyApp.Supervisor]
  Supervisor.start_link(children, opts)
end

Strategies

Only a percentage-based strategy is currently supported. Disconnect batches are made according to the percentage size provided. Each batch is processed at the given interval. Within a batch, each individual socket disconnect is processed in its own process with an added jitter of 1-100ms.

Telemetry

  • [:socket_drano, :monitor, :start]
  • [:socket_drano, :monitor, :stop]

Link to this section Summary

Link to this section Functions

Returns a specification to start this module under a supervisor.

See Supervisor.

Link to this function

handle_event(list, measurements, meta, arg4)

View Source

Callback implementation for GenServer.init/1.