Configuration reference
View Sourceerllama configuration lives in two places: the OTP application
environment (config/sys.config) and the per-model option map
passed to erllama:load_model/1,2. This page is the full set.
Application environment
{erllama, [
%% --------------- Save-policy gates -----------------------------
{min_tokens, 512},
{cold_min_tokens, 512},
{cold_max_tokens, 30000},
{continued_interval, 2048},
{boundary_trim_tokens, 32},
{boundary_align_tokens, 2048},
%% --------------- Cache flow tunables ---------------------------
{evict_save_timeout_ms, 30000},
{session_resume_wait_ms, 500},
{fingerprint_mode, safe}, %% safe | gguf_chunked | fast_unsafe
%% --------------- Memory-pressure scheduler ---------------------
{scheduler, #{
enabled => false,
pressure_source => noop,
interval_ms => 5000,
high_watermark => 0.85,
low_watermark => 0.75,
min_evict_bytes => 1048576,
evict_tiers => [ram, ram_file]
}}
]}.Tiers
The RAM tier (erllama_cache_ram) starts automatically with the
application. For ram_file or disk tiers, start an
erllama_cache_disk_srv per root in your own supervision tree (or
from a release start hook) and pass its registered name as tier_srv
on the relevant load_model/1,2 call:
{ok, _} = erllama_cache_disk_srv:start_link(my_disk, "/var/lib/erllama/kvc"),
{ok, _} = erllama_cache_ramfile_srv:start_link(my_shm, "/dev/shm/erllama").There is no single tiers env key in v0.1: per-process supervision
gives you crisper restart semantics than a static list.
Save-policy gates
See the caching guide for what each
threshold does. All are overridable per-model via the policy map.
evict_save_timeout_ms
How long synchronous evict and shutdown saves wait for the
writer to finish before giving up. Defaults to 30 s. Bump for
8B-class models on slow disks.
session_resume_wait_ms
When a parent_key is supplied and the cache sees a matching
in-flight finish save, it waits up to this long for the save to
publish before falling through to a cold prefill. 500 ms is enough
for SSD-backed deployments; bump if you observe back-to-back
multi-turn cold misses on slow storage.
fingerprint_mode
How to verify the model fingerprint at load:
safe— full SHA-256 over the file. Slow on multi-GB GGUFs.gguf_chunked— fingerprint metadata + first weights tensor. Catches accidental corruption, not malicious tampering.fast_unsafe— trust the supplied fingerprint blindly. Use only when you fingerprint upstream and pass the digest through.
scheduler
See the caching guide.
Per-model options
Passed to erllama:load_model/1,2:
#{
backend => erllama_model_llama,
model_path => "/path/to/x.gguf",
model_opts => #{n_gpu_layers => 99},
context_opts => #{n_ctx => 4096, n_batch => 512},
fingerprint => <<32 bytes>>,
fingerprint_mode => safe,
quant_type => q4_k_m,
quant_bits => 4,
ctx_params_hash => <<32 bytes>>,
context_size => 4096,
tier_srv => my_disk,
tier => disk,
policy => #{
min_tokens => 256,
cold_min_tokens => 256,
cold_max_tokens => 8192,
continued_interval => 256,
boundary_trim_tokens => 32,
boundary_align_tokens => 256,
session_resume_wait_ms => 500
}
}See loading a model for the per-field walkthrough.
Inspecting effective config
1> application:get_env(erllama, scheduler).
{ok, #{enabled => true, ...}}
2> erllama_scheduler:status().
#{enabled => true, pressure_source => system, ...}
3> erllama_cache_meta_srv:dump().
%% List of raw ETS tuples; see include/erllama_cache.hrl for the
%% position layout.
[{<<_:256>>, disk, 8388608, _, 0, available, _, _, _, 4}, ...]