Modules
High-performance document extraction for Elixir.
OTP Application callback for Kreuzberg.
Structure representing an entry extracted from an archive file.
Asynchronous extraction operations using Elixir Tasks.
Batch extraction operations for processing multiple documents efficiently.
BibTeX bibliography metadata.
Bounding box coordinates for element positioning in documents.
Cache management operations for the Kreuzberg extraction library.
Structure representing a text chunk with embedding for semantic search.
Metadata for a text chunk, tracking byte positions, indices, and page range.
Citation file metadata (RIS, PubMed, EndNote).
Code chunk with source span and optional parent context.
Context for a code chunk (parent scope information).
Comment information.
Parse diagnostic (error or warning from tree-sitter).
Section within a docstring.
Docstring information with parsed sections.
Export statement information.
File-level code metrics from tree-sitter analysis.
Import statement information.
Result of tree-sitter code processing.
Byte and line/column span for a code element.
Structural code element (function, class, method, etc.).
Symbol definition information.
Enumeration of content layers within a document.
JATS contributor with role.
CSV/TSV file metadata.
dBASE field information.
dBASE (DBF) file metadata.
Element attributes in Djot ({.class #id key="value"} syntax).
Comprehensive Djot document structure with semantic preservation.
Footnote in a Djot document.
Block-level element in a Djot document (paragraph, heading, list, etc.).
Image element in a Djot document.
Inline element within a Djot block (text, emphasis, link, etc.).
Link element in a Djot document.
A single node in the document tree.
Structured document representation with hierarchical node tree.
Inline text annotation with byte-range formatting and links.
Semantic element extracted from a document.
Metadata for a semantic element extracted from a document.
Enumeration of semantic element types in a document.
Configuration for standalone text embedding generation.
EPUB metadata (Dublin Core extensions).
Exception module for Kreuzberg extraction errors.
Error metadata when extraction partially failed.
Configuration structure for document extraction operations.
Structure representing the result of a document extraction operation.
FictionBook (FB2) metadata.
Shared helper functions for Kreuzberg extraction modules.
A hierarchical block within a page, representing heading-level structure.
Structure representing an extracted image from a document.
Metadata about image preprocessing applied before OCR.
JATS (Journal Article Tag Suite) metadata.
Structure representing an extracted keyword with score and algorithm info.
Enumeration of keyword extraction algorithms.
A detected layout region on a page.
Legacy API functions using deprecated patterns.
Structure representing document metadata extracted from files.
Bounding geometry for OCR-extracted text elements.
Confidence scores for OCR text detection and recognition.
OCR-extracted text element with detailed positioning and confidence information.
Enumeration of OCR element hierarchical levels.
Rotation information for OCR-detected text.
Enumeration of output content formats.
Structure representing a single page extracted from a multi-page document.
Byte offset boundary for a page.
Hierarchy information for a page, containing heading-level blocks.
Metadata for an individual page/slide/sheet.
Page structure information for a document.
Enumeration of page unit types in documents.
Structure representing a PDF annotation extracted from a document page.
Enumeration of PDF annotation types.
Public Plugin API facade for registering and managing Kreuzberg plugins.
Behaviour module for OCR backends in the Kreuzberg plugin system.
Behaviour module for post-processor plugins in the Kreuzberg plugin system.
GenServer for managing Kreuzberg plugins.
OTP Supervisor for the Kreuzberg plugin system.
Behaviour module for Kreuzberg document extraction validators.
Structure representing a warning generated during document processing.
Outlook PST archive metadata.
Enumeration of relationship kinds between document elements.
Enumeration of result structure formats.
Structure representing an extracted table from a document.
Tree-sitter configuration for code parsing.
Tree-sitter process configuration for code extraction.
Structure representing a URI extracted from a document.
Enumeration of URI kinds.
Utility functions for Kreuzberg extraction operations.
Configuration validators for Kreuzberg extraction options.
Year range for bibliographic metadata.