# Configuration reference

erllama configuration lives in two places: the OTP application
environment (`config/sys.config`) and the per-model option map
passed to `erllama:load_model/1,2`. This page is the full set.

## Application environment

```erlang
{erllama, [
  %% --------------- Save-policy gates -----------------------------
  {min_tokens,             512},
  {cold_min_tokens,        512},
  {cold_max_tokens,      30000},
  {continued_interval,    2048},
  {boundary_trim_tokens,    32},
  {boundary_align_tokens, 2048},

  %% --------------- Cache flow tunables ---------------------------
  {evict_save_timeout_ms,  30000},
  {session_resume_wait_ms,   500},
  {fingerprint_mode,         safe},   %% safe | gguf_chunked | fast_unsafe

  %% --------------- Memory-pressure scheduler ---------------------
  {scheduler, #{
    enabled         => false,
    pressure_source => noop,
    interval_ms     => 5000,
    high_watermark  => 0.85,
    low_watermark   => 0.75,
    min_evict_bytes => 1048576,
    evict_tiers     => [ram, ram_file]
  }}
]}.
```

### Tiers

The RAM tier (`erllama_cache_ram`) starts automatically with the
application. For `ram_file` or `disk` tiers, start an
`erllama_cache_disk_srv` per root in your own supervision tree (or
from a release start hook) and pass its registered name as `tier_srv`
on the relevant `load_model/1,2` call:

```erlang
{ok, _} = erllama_cache_disk_srv:start_link(my_disk,    "/var/lib/erllama/kvc"),
{ok, _} = erllama_cache_ramfile_srv:start_link(my_shm,  "/dev/shm/erllama").
```

There is no single `tiers` env key in v0.1: per-process supervision
gives you crisper restart semantics than a static list.

### Save-policy gates

See the [caching guide](caching.md#save-policy-gates) for what each
threshold does. All are overridable per-model via the `policy` map.

### `evict_save_timeout_ms`

How long synchronous `evict` and `shutdown` saves wait for the
writer to finish before giving up. Defaults to 30 s. Bump for
8B-class models on slow disks.

### `session_resume_wait_ms`

When a `parent_key` is supplied and the cache sees a matching
in-flight finish save, it waits up to this long for the save to
publish before falling through to a cold prefill. 500 ms is enough
for SSD-backed deployments; bump if you observe back-to-back
multi-turn cold misses on slow storage.

### `fingerprint_mode`

How to verify the model fingerprint at load:

- `safe` — full SHA-256 over the file. Slow on multi-GB GGUFs.
- `gguf_chunked` — fingerprint metadata + first weights tensor.
  Catches accidental corruption, not malicious tampering.
- `fast_unsafe` — trust the supplied fingerprint blindly. Use only
  when you fingerprint upstream and pass the digest through.

### `scheduler`

See the [caching guide](caching.md#memory-pressure-driven-eviction).

## Per-model options

Passed to `erllama:load_model/1,2`:

```erlang
#{
  backend           => erllama_model_llama,
  model_path        => "/path/to/x.gguf",
  model_opts        => #{n_gpu_layers => 99},
  context_opts      => #{n_ctx => 4096, n_batch => 512},
  fingerprint       => <<32 bytes>>,
  fingerprint_mode  => safe,
  quant_type        => q4_k_m,
  quant_bits        => 4,
  ctx_params_hash   => <<32 bytes>>,
  context_size      => 4096,
  tier_srv          => my_disk,
  tier              => disk,
  policy            => #{
    min_tokens             => 256,
    cold_min_tokens        => 256,
    cold_max_tokens        => 8192,
    continued_interval     => 256,
    boundary_trim_tokens   => 32,
    boundary_align_tokens  => 256,
    session_resume_wait_ms => 500
  }
}
```

See [loading a model](loading.md) for the per-field walkthrough.

## Inspecting effective config

```erlang
1> application:get_env(erllama, scheduler).
{ok, #{enabled => true, ...}}

2> erllama_scheduler:status().
#{enabled => true, pressure_source => system, ...}

3> erllama_cache_meta_srv:dump().
%% List of raw ETS tuples; see include/erllama_cache.hrl for the
%% position layout.
[{<<_:256>>, disk, 8388608, _, 0, available, _, _, _, 4}, ...]
```