# LlamaCppEx v0.8.6 - Table of Contents

Elixir bindings for llama.cpp — run LLMs locally with Metal, CUDA, Vulkan, or CPU acceleration.

## Pages

- [LlamaCppEx](readme.md)
- [Changelog](changelog.md)
- [LICENSE](license.md)
- [Architecture](architecture.md)
- [Cross-Platform Builds](cross-platform-builds.md)
- [Examples](examples.md)
- [Performance Guide](performance.md)
- [Release Guide](release-guide.md)

- Architecture Decision Records
  - [ADR 001: C++ NIF Over Rustler](001-cpp-nif-over-rustler.md)
  - [ADR 002: fine for NIF Ergonomics](002-fine-for-nif-ergonomics.md)
  - [ADR 003: Static Linking of llama.cpp](003-static-linking.md)
  - [ADR 004: Streaming via enif_send](004-streaming-via-enif-send.md)
  - [ADR 005: Batching Architecture](005-batching-architecture.md)
  - [ADR 006: Continuous Batching](006-continuous-batching.md)
  - [ADR 007: Prefix Caching (Same-Slot KV Reuse)](007-prefix-caching.md)
  - [ADR 008: Pluggable Batching Strategies](008-batching-strategies.md)

## Modules

- [LlamaCppEx.ChatCompletion](LlamaCppEx.ChatCompletion.md): OpenAI-compatible chat completion response struct.
- [LlamaCppEx.ChatCompletionChunk](LlamaCppEx.ChatCompletionChunk.md): OpenAI-compatible streaming chat completion chunk struct.
- [LlamaCppEx.Thinking](LlamaCppEx.Thinking.md): Parser for `<think>...</think>` blocks in thinking model output.

- High-Level API
  - [LlamaCppEx](LlamaCppEx.md): Elixir bindings for llama.cpp.

- Core Modules
  - [LlamaCppEx.Chat](LlamaCppEx.Chat.md): Chat template formatting using llama.cpp's Jinja template engine.
  - [LlamaCppEx.Context](LlamaCppEx.Context.md): Inference context with KV cache.

  - [LlamaCppEx.Embedding](LlamaCppEx.Embedding.md): Generate embeddings from text using an embedding model.
  - [LlamaCppEx.Grammar](LlamaCppEx.Grammar.md): Converts JSON Schema to GBNF grammar for constrained generation.
  - [LlamaCppEx.Hub](LlamaCppEx.Hub.md): Download GGUF models from HuggingFace Hub.
  - [LlamaCppEx.Model](LlamaCppEx.Model.md): Model loading and introspection.

  - [LlamaCppEx.Sampler](LlamaCppEx.Sampler.md): Token sampling configuration.
  - [LlamaCppEx.Schema](LlamaCppEx.Schema.md): Converts Ecto schema modules to JSON Schema maps for structured output.
  - [LlamaCppEx.Server](LlamaCppEx.Server.md): GenServer for continuous batched multi-sequence inference.
  - [LlamaCppEx.Tokenizer](LlamaCppEx.Tokenizer.md): Text tokenization and detokenization.

- Batching Strategies
  - [LlamaCppEx.Server.BatchStrategy](LlamaCppEx.Server.BatchStrategy.md): Behavior for batch building strategies.
  - [LlamaCppEx.Server.Strategy.Balanced](LlamaCppEx.Server.Strategy.Balanced.md): Balanced batching strategy.
  - [LlamaCppEx.Server.Strategy.DecodeMaximal](LlamaCppEx.Server.Strategy.DecodeMaximal.md): Decode-maximal batching strategy.
  - [LlamaCppEx.Server.Strategy.PrefillPriority](LlamaCppEx.Server.Strategy.PrefillPriority.md): Prefill-priority batching strategy.

