# Scaling Open Responses

## Single node (current default)

Out of the box, Open Responses runs on a single node and handles hundreds of concurrent users without any configuration changes. The BEAM scheduler distributes work across all CPU cores; each request runs as an isolated GenServer process under `LoopSupervisor`. Thousands of concurrent streaming SSE connections are normal workload for a single Bandit/Cowboy node.

The practical limit on a single node is memory. Each active response holds its output in ETS and in the Loop process heap. A well-specced node (8 cores, 16 GB RAM) comfortably handles 1,000–5,000 simultaneous active streams, with the true bottleneck being LLM provider latency rather than BEAM throughput.

**What does not survive a single-node setup:**
- Node restart drops all in-flight responses (ETS is in-memory)
- No horizontal scaling — a second node has no access to responses created on the first

For development and small production deployments, this is fine.

---

## Stage 1: Durable storage (AshPostgres)

The first production upgrade is swapping the `Response` resource from ETS to Postgres. This is one line change:

```elixir
# lib/open_responses/responses/response.ex
use Ash.Resource,
  domain: OpenResponses.Responses,
  data_layer: AshPostgres.DataLayer,   # replaces Ash.DataLayer.Ets
  extensions: [AshStateMachine]
```

Add to `mix.exs`:

```elixir
{:ash_postgres, "~> 2.0"}
```

Generate and run the migration:

```bash
mix ash_postgres.generate_migrations
mix ecto.migrate
```

Nothing else changes. All actions, state machine transitions, and streaming behaviour are identical. Responses now survive restarts and are visible to every node.

---

## Stage 2: Multi-node clustering

With Postgres in place, add `libcluster` to form an Erlang cluster across nodes. Phoenix.PubSub (which broadcasts SSE events) uses the `pg` adapter and works across a cluster automatically — no changes needed.

```elixir
# mix.exs
{:libcluster, "~> 3.3"}
```

```elixir
# config/runtime.exs
config :libcluster,
  topologies: [
    open_responses: [
      strategy: Cluster.Strategy.Gossip   # or Kubernetes, DNS, EPMD
    ]
  ]
```

```elixir
# lib/open_responses/application.ex
{Cluster.Supervisor, [Application.get_env(:libcluster, :topologies), [name: OpenResponses.ClusterSupervisor]]}
```

With clustering active, PubSub events broadcast cluster-wide. A response created on node A fires SSE events that node B's connected client receives correctly.

The remaining gap is `previous_response_id` context: when a multi-turn request lands on node B, it queries `ResponseCache` (Cachex), which lives only on the node that handled the first turn.

---

## Stage 3: Distributed Loop registry (Horde)

This is the approach that eliminates sticky sessions entirely, using the same strategy deployed in large-scale Elixir medical and telco systems.

**The idea**: instead of routing requests to nodes, register Loop processes in a cluster-wide registry. Any node can look up or start a process for a given `response_id`, regardless of which node originally created it. Horde implements this registry using CRDTs (Conflict-free Replicated Data Types) — the registry state converges automatically across nodes with no single point of coordination.

```elixir
# mix.exs
{:horde, "~> 0.9"}
```

Replace `LoopSupervisor` with a Horde supervisor and registry:

```elixir
# lib/open_responses/application.ex
children = [
  {Horde.Registry, [name: OpenResponses.LoopRegistry, keys: :unique, members: :auto]},
  {Horde.DynamicSupervisor, [name: OpenResponses.LoopSupervisor, strategy: :one_for_one, members: :auto]},
  ...
]
```

Register each Loop process under its `response_id`:

```elixir
# lib/open_responses/loop.ex
def init(opts) do
  response = Keyword.fetch!(opts, :response)
  {:ok, _} = Horde.Registry.register(OpenResponses.LoopRegistry, response.id, self())
  ...
end
```

Start loops via the Horde supervisor:

```elixir
Horde.DynamicSupervisor.start_child(
  OpenResponses.LoopSupervisor,
  {OpenResponses.Loop, opts}
)
```

Look up a running loop from any node:

```elixir
case Horde.Registry.lookup(OpenResponses.LoopRegistry, response_id) do
  [{pid, _}] -> send(pid, :cancel)
  [] -> :not_found
end
```

With Horde in place:

- A streaming client connected to node A receives events from a Loop running on node B transparently, via cluster-wide PubSub
- A `previous_response_id` request landing on any node reconstructs context from Postgres (Stage 1), so Cachex is no longer the bottleneck
- No sticky sessions required at the load balancer — round-robin or least-connections works correctly

---

## Stage 4: Consistent hashing (alternative to Horde)

If you prefer to keep process affinity simple without a distributed registry, consistent hashing routes requests deterministically to the same node based on a key — typically the `response_id` or a `user_id`. The hash ring recalculates minimally when nodes join or leave, unlike naive modulo hashing.

```elixir
# mix.exs
{:libring, "~> 1.6"}
```

```elixir
ring = HashRing.new()
|> HashRing.add_node("node1@host")
|> HashRing.add_node("node2@host")
|> HashRing.add_node("node3@host")

target_node = HashRing.key_to_node(ring, response_id)
```

Requests for the same `response_id` always hash to the same node, so that node's Cachex cache always has the relevant context. When a new node joins, only ~1/N of keys rehash — existing sessions are unaffected.

The tradeoff versus Horde: simpler mental model, but requires either a smart load balancer that respects the hash ring or an application-layer proxy hop when the receiving node isn't the target node. Horde's registry eliminates that proxy hop by making every process findable from everywhere.

---

## Comparison

| Approach | Sticky sessions needed | Survives restarts | Works multi-node | Complexity |
|---|---|---|---|---|
| ETS (default) | N/A (single node) | No | No | None |
| AshPostgres only | Yes | Yes | Partial | Low |
| AshPostgres + libcluster | Yes | Yes | Yes (with stickiness) | Low |
| AshPostgres + Horde | No | Yes | Yes | Medium |
| AshPostgres + consistent hashing | At LB only | Yes | Yes | Medium |

For most deployments, **Stage 1 (Postgres) + Stage 2 (libcluster) + sticky sessions at the load balancer** is the pragmatic path. It requires no application code changes beyond the data layer swap and a cluster topology config.

Horde becomes worth the added complexity when you need zero-downtime rolling deploys, automatic process migration on node failure, or truly stateless load balancing.

---

## Kubernetes deployment note

On Kubernetes, the DNS clustering strategy in libcluster discovers pods automatically:

```elixir
config :libcluster,
  topologies: [
    open_responses: [
      strategy: Cluster.Strategy.Kubernetes.DNS,
      config: [
        service: "open-responses-headless",
        application_name: "open_responses"
      ]
    ]
  ]
```

Sticky sessions are available via `sessionAffinity: ClientIP` on a Kubernetes Service, or via a cookie-based affinity annotation in ingress-nginx. With Horde, skip the affinity entirely and let the scheduler round-robin freely.
