erllama_inflight (erllama v0.6.2)
View SourceTracks in-flight streaming inference requests.
Each erllama:infer/4 admit produces a unique reference() and an
entry in a public ETS table mapping that reference back to the
serving erllama_model gen_statem. erllama:cancel/1 looks up the
ref here to find which model owns the request, then casts a
{cancel, Ref} event at it.
The table is a fixed-name public ETS so lookups are lock-free from any process. The owning gen_server (this module) is here only to keep the table alive across releases and to clean up entries when a model dies unexpectedly.
Summary
Functions
O(1) snapshot of currently-registered inflight rows. Reads the
atomics counter parked in persistent_term; returns 0 when the
gen_server has not been started yet (the counter does not exist
in that case).
Per-model inflight count. Walks the inflight table filtering on
the model's pid; O(N) in the table size but accurate for a single
model. The global queue_depth/0 is still O(1) via the atomics
counter; use that for cross-model totals.
Functions
-spec obs_delete(binary()) -> true.
-spec queue_depth() -> non_neg_integer().
O(1) snapshot of currently-registered inflight rows. Reads the
atomics counter parked in persistent_term; returns 0 when the
gen_server has not been started yet (the counter does not exist
in that case).
Counts only admitted streaming requests. Pending FIFO requests queued inside an individual model gen_statem (admitted to the mailbox but not yet streaming) are not visible here.
-spec queue_depth(pid()) -> non_neg_integer().
Per-model inflight count. Walks the inflight table filtering on
the model's pid; O(N) in the table size but accurate for a single
model. The global queue_depth/0 is still O(1) via the atomics
counter; use that for cross-model totals.
-spec unregister(reference()) -> ok.