API Reference erllama v#0.1.0

View Source

Modules

Public façade for the erllama application.

Public façade for the cache subsystem.

Microbench helpers for the cache subsystem.

Cache subsystem operational counters.

Disk tier server (read_write mode).

KVC v2 file framing and TLV codec.

Sole writer for the cache meta and LRU ETS tables; arbitrates claim/release and the reservation state machine for save publication.

Pure-Erlang policy decisions for the erllama_cache subsystem.

RAM tier slab store.

RAM-file tier server.

File-tier save orchestrator with a leak-proof ETS counting semaphore.

Tracks in-flight streaming inference requests.

Per-model gen_statem that drives the request flow and wires the cache subsystem into the model lifecycle.

Behaviour describing the operations the erllama_model gen_statem needs from a backing inference engine.

Real-llama.cpp backend for erllama_model.

Deterministic stub backend for erllama_model`. No NIF, no GGUF. tokenize uses `erlang:phash2/1` over whitespace- delimited words; decode_one produces a deterministic next-token from the contexts hash; pack/unpack serialise the token list as bytes. Useful for tests of the cache integration that don't need real inference.

Dynamic supervisor for erllama_model gen_statems. Each loaded model is one child started via start_model/2. simple_one_for_one strategy: children are spawned on demand from a single child spec.

Single NIF entry module for erllama.

Behaviour and helpers for memory-pressure samplers used by erllama_scheduler. A sampler is a stateless module that returns the current {Used, Total} byte tuple for the resource it tracks (system RAM, GPU VRAM, or a custom source).

NVIDIA GPU memory-pressure sampler. Aggregates VRAM usage across every GPU on the host via nvidia-smi --query-gpu=memory.used, memory.total --format=csv,noheader,nounits.

System-memory pressure sampler backed by OTP's memsup (from os_mon). Portable across Linux, macOS, BSD, and Windows. Returns {Total - Available, Total}.

ETS-backed via callback for naming erllama_model gen_statems by binary model_id().

Memory-pressure-driven cache eviction.