AgentSea.Ingest (agentsea_ingest v0.1.0)

Copy Markdown

Document ingestion. chunk_documents/2 turns documents into chunk messages (the unit the AgentSea.Ingest.Pipeline Broadway topology embeds and stores).

A document is a map with :id, :text, and optional :metadata. Each chunk carries id "<doc_id>-<n>" and inherits the document's metadata plus :source (the document id).

Summary

Types

chunk()

@type chunk() :: %{id: String.t(), text: String.t(), metadata: map()}

document()

@type document() :: %{
  :id => term(),
  :text => String.t(),
  optional(:metadata) => map()
}

Functions

chunk_documents(documents, opts \\ [])

@spec chunk_documents(
  [document()],
  keyword()
) :: [chunk()]