erllama_inflight (erllama v0.2.0)

View Source

Tracks in-flight streaming inference requests.

Each erllama:infer/4 admit produces a unique reference() and an entry in a public ETS table mapping that reference back to the serving erllama_model gen_statem. erllama:cancel/1 looks up the ref here to find which model owns the request, then casts a {cancel, Ref} event at it.

The table is a fixed-name public ETS so lookups are lock-free from any process. The owning gen_server (this module) is here only to keep the table alive across releases and to clean up entries when a model dies unexpectedly.

Summary

Functions

O(1) snapshot of currently-registered inflight rows. Reads the atomics counter parked in persistent_term; returns 0 when the gen_server has not been started yet (the counter does not exist in that case).

Per-model inflight count. Walks the inflight table filtering on the model's pid; O(N) in the table size but accurate for a single model. The global queue_depth/0 is still O(1) via the atomics counter; use that for cross-model totals.

Functions

all()

-spec all() -> [{reference(), pid()}].

handle_call/3

handle_cast/2

handle_info/2

init/1

lookup(Ref)

-spec lookup(reference()) -> {ok, pid()} | {error, not_found}.

obs_delete(ModelId)

-spec obs_delete(binary()) -> true.

obs_get(ModelId)

-spec obs_get(binary()) -> tuple() | undefined.

obs_put(ModelId, Row)

-spec obs_put(binary(), tuple()) -> true.

queue_depth()

-spec queue_depth() -> non_neg_integer().

O(1) snapshot of currently-registered inflight rows. Reads the atomics counter parked in persistent_term; returns 0 when the gen_server has not been started yet (the counter does not exist in that case).

Counts only admitted streaming requests. Pending FIFO requests queued inside an individual model gen_statem (admitted to the mailbox but not yet streaming) are not visible here.

queue_depth(Pid)

-spec queue_depth(pid()) -> non_neg_integer().

Per-model inflight count. Walks the inflight table filtering on the model's pid; O(N) in the table size but accurate for a single model. The global queue_depth/0 is still O(1) via the atomics counter; use that for cross-model totals.

register(Ref, ModelPid)

-spec register(reference(), pid()) -> ok.

start_link()

terminate/2

unregister(Ref)

-spec unregister(reference()) -> ok.