API Reference erllama v#0.1.0
View SourceModules
Public façade for the erllama application.
Public façade for the cache subsystem.
Microbench helpers for the cache subsystem.
Cache subsystem operational counters.
Disk tier server (read_write mode).
KVC v2 file framing and TLV codec.
Sole writer for the cache meta and LRU ETS tables; arbitrates claim/release and the reservation state machine for save publication.
Pure-Erlang policy decisions for the erllama_cache subsystem.
RAM tier slab store.
RAM-file tier server.
File-tier save orchestrator with a leak-proof ETS counting semaphore.
Tracks in-flight streaming inference requests.
Per-model gen_statem that drives the request flow and wires the cache subsystem into the model lifecycle.
Behaviour describing the operations the erllama_model gen_statem
needs from a backing inference engine.
Real-llama.cpp backend for erllama_model.
Deterministic stub backend for erllama_model`. No NIF, no GGUF. tokenize uses `erlang:phash2/1` over whitespace- delimited words; decode_one produces a deterministic next-token from the contexts hash; pack/unpack serialise the token list as bytes. Useful for tests of the cache integration that don't need real inference.
Dynamic supervisor for erllama_model gen_statems. Each loaded
model is one child started via start_model/2. simple_one_for_one
strategy: children are spawned on demand from a single child spec.
Single NIF entry module for erllama.
Behaviour and helpers for memory-pressure samplers used by
erllama_scheduler. A sampler is a stateless module that returns
the current {Used, Total} byte tuple for the resource it tracks
(system RAM, GPU VRAM, or a custom source).
NVIDIA GPU memory-pressure sampler. Aggregates VRAM usage across
every GPU on the host via nvidia-smi --query-gpu=memory.used, memory.total --format=csv,noheader,nounits.
System-memory pressure sampler backed by OTP's memsup (from
os_mon). Portable across Linux, macOS, BSD, and Windows. Returns
{Total - Available, Total}.
ETS-backed via callback for naming erllama_model gen_statems by
binary model_id().
Memory-pressure-driven cache eviction.