erllama_inflight (erllama v0.1.2)

View Source

Tracks in-flight streaming inference requests.

Each erllama:infer/4 admit produces a unique reference() and an entry in a public ETS table mapping that reference back to the serving erllama_model gen_statem. erllama:cancel/1 looks up the ref here to find which model owns the request, then casts a {cancel, Ref} event at it.

The table is a fixed-name public ETS so lookups are lock-free from any process. The owning gen_server (this module) is here only to keep the table alive across releases and to clean up entries when a model dies unexpectedly.

Summary

Functions

O(1) snapshot of currently-registered inflight rows. Reads the atomics counter parked in persistent_term; returns 0 when the gen_server has not been started yet (the counter does not exist in that case).

Functions

all()

-spec all() -> [{reference(), pid()}].

handle_call/3

handle_cast/2

handle_info/2

init/1

lookup(Ref)

-spec lookup(reference()) -> {ok, pid()} | {error, not_found}.

queue_depth()

-spec queue_depth() -> non_neg_integer().

O(1) snapshot of currently-registered inflight rows. Reads the atomics counter parked in persistent_term; returns 0 when the gen_server has not been started yet (the counter does not exist in that case).

Counts only admitted streaming requests. Pending FIFO requests queued inside an individual model gen_statem (admitted to the mailbox but not yet streaming) are not visible here.

register(Ref, ModelPid)

-spec register(reference(), pid()) -> ok.

start_link()

terminate/2

unregister(Ref)

-spec unregister(reference()) -> ok.