API Reference LlamaCppEx v#0.8.8

Copy Markdown View Source

Modules

Elixir bindings for llama.cpp.

Chat template formatting using llama.cpp's Jinja template engine.

OpenAI-compatible chat completion response struct.

OpenAI-compatible streaming chat completion chunk struct.

Inference context with KV cache.

Generate embeddings from text using an embedding model.

Converts JSON Schema to GBNF grammar for constrained generation.

Download GGUF models from HuggingFace Hub.

Multi-Token Prediction (MTP) speculative decoding.

Model loading and introspection.

Token sampling configuration.

Converts Ecto schema modules to JSON Schema maps for structured output.

GenServer for continuous batched multi-sequence inference.

Behavior for batch building strategies.

Balanced batching strategy.

Decode-maximal batching strategy.

Prefill-priority batching strategy.

Parser for <think>...</think> blocks in thinking model output.

Text tokenization and detokenization.