ConfluenceLoader.Pages (confluence_loader v0.1.2)

View Source

Functions for fetching and processing Confluence pages.

Summary

Functions

Get a specific page by ID.

Get all pages with optional filtering.

Load all pages from Confluence and convert them to Document format. This function mimics the behavior of the Python llama-index-readers-confluence library.

Load documents from a specific space created at or after a given timestamp.

Load pages from a specific space and convert them to Document format.

Convert a page response to a Document struct.

Stream documents from a specific space in batches of 4.

Types

page_params()

@type page_params() :: %{
  optional(:id) => [integer()],
  optional(:space_id) => [integer()],
  optional(:sort) => String.t(),
  optional(:status) => [String.t()],
  optional(:title) => String.t(),
  optional(:body_format) => String.t(),
  optional(:cursor) => String.t(),
  optional(:limit) => integer()
}

Functions

get_page(client, page_id, params \\ %{})

@spec get_page(ConfluenceLoader.Client.t(), String.t() | integer(), map()) ::
  {:ok, map()} | {:error, term()}

Get a specific page by ID.

Parameters

  • client: The Confluence client
  • page_id: The ID of the page to retrieve
  • params: Optional parameters (e.g., body_format)

Examples

iex> {:ok, page} = ConfluenceLoader.Pages.get_page(client, "123456")

get_pages(client, params \\ %{})

@spec get_pages(ConfluenceLoader.Client.t(), page_params()) ::
  {:ok, map()} | {:error, term()}

Get all pages with optional filtering.

Parameters

  • client: The Confluence client
  • params: Optional parameters for filtering pages

Examples

iex> {:ok, pages} = ConfluenceLoader.Pages.get_pages(client, %{space_id: [123], limit: 10})

get_pages_for_label(client, label_id, params \\ %{})

@spec get_pages_for_label(ConfluenceLoader.Client.t(), String.t() | integer(), map()) ::
  {:ok, map()} | {:error, term()}

Get pages for a specific label.

Parameters

  • client: The Confluence client
  • label_id: The ID of the label
  • params: Optional parameters for filtering

Examples

iex> {:ok, pages} = ConfluenceLoader.Pages.get_pages_for_label(client, "456", %{limit: 10})

get_pages_in_space(client, space_key, params \\ %{})

@spec get_pages_in_space(ConfluenceLoader.Client.t(), String.t() | integer(), map()) ::
  {:ok, map()} | {:error, term()}

Get pages in a specific space.

Parameters

  • client: The Confluence client
  • space_key: The key of the space (e.g., "PROJ", "TEAM") or numeric space ID
  • params: Optional parameters for filtering

Examples

iex> {:ok, pages} = ConfluenceLoader.Pages.get_pages_in_space(client, "PROJ", %{limit: 20})

load_documents(client, params \\ %{})

@spec load_documents(ConfluenceLoader.Client.t(), map()) ::
  {:ok, [ConfluenceLoader.Document.t()]} | {:error, term()}

Load all pages from Confluence and convert them to Document format. This function mimics the behavior of the Python llama-index-readers-confluence library.

Parameters

  • client: The Confluence client
  • params: Optional parameters for filtering pages
    • :status - List of page statuses to filter by. Default: ["current"] Valid values: ["current", "archived", "deleted", "trashed"]
    • :space_id - List of space IDs to filter by
    • :limit - Maximum number of documents to return
    • :body_format - Format for page body (default: "storage")

Examples

iex> {:ok, documents} = ConfluenceLoader.Pages.load_documents(client, %{space_id: [123]})

# Load only archived pages
iex> {:ok, documents} = ConfluenceLoader.Pages.load_documents(client, %{status: ["archived"]})

# Load current and deleted pages
iex> {:ok, documents} = ConfluenceLoader.Pages.load_documents(client, %{status: ["current", "deleted"]})

load_documents_since(client, space_key, since_timestamp, params \\ %{})

@spec load_documents_since(
  ConfluenceLoader.Client.t(),
  String.t() | integer(),
  DateTime.t() | String.t(),
  map()
) :: {:ok, [ConfluenceLoader.Document.t()]} | {:error, term()}

Load documents from a specific space created at or after a given timestamp.

This method filters pages by namespace (space) and creation timestamp. Since the Confluence API doesn't directly support timestamp filtering, this method fetches all pages from the space and filters them client-side.

Parameters

  • client: The Confluence client
  • space_key: The key of the space (e.g., "PROJ", "TEAM") or numeric space ID
  • since_timestamp: DateTime struct or ISO 8601 string (e.g., "2024-01-01T00:00:00Z")
  • params: Optional parameters for filtering (limit, body_format, etc.)
    • :status - List of page statuses to filter by. Default: ["current"] Valid values: ["current", "archived", "deleted", "trashed"]
    • :limit - Maximum number of documents to return
    • :body_format - Format for page body (default: "storage")

Examples

# Using DateTime
{:ok, since_date} = DateTime.new(~D[2024-01-01], ~T[00:00:00], "Etc/UTC")
{:ok, documents} = ConfluenceLoader.Pages.load_documents_since(client, "PROJ", since_date)

# Using ISO string
{:ok, documents} = ConfluenceLoader.Pages.load_documents_since(client, "PROJ", "2024-01-01T00:00:00Z")

# With additional parameters including status
{:ok, documents} = ConfluenceLoader.Pages.load_documents_since(client, "PROJ", since_date, %{limit: 50, status: ["current", "archived"]})

load_space_documents(client, space_key, params \\ %{})

@spec load_space_documents(ConfluenceLoader.Client.t(), String.t() | integer(), map()) ::
  {:ok, [ConfluenceLoader.Document.t()]} | {:error, term()}

Load pages from a specific space and convert them to Document format.

Parameters

  • client: The Confluence client
  • space_key: The key of the space (e.g., "PROJ", "TEAM") or numeric space ID
  • params: Optional parameters for filtering
    • :status - List of page statuses to filter by. Default: ["current"] Valid values: ["current", "archived", "deleted", "trashed"]
    • :limit - Maximum number of documents to return
    • :body_format - Format for page body (default: "storage")

Examples

iex> {:ok, documents} = ConfluenceLoader.Pages.load_space_documents(client, "PROJ")

# Load only archived pages from space
iex> {:ok, documents} = ConfluenceLoader.Pages.load_space_documents(client, "PROJ", %{status: ["archived"]})

# Load current and trashed pages from space
iex> {:ok, documents} = ConfluenceLoader.Pages.load_space_documents(client, "PROJ", %{status: ["current", "trashed"]})

page_to_document(page)

@spec page_to_document(map()) :: ConfluenceLoader.Document.t()

Convert a page response to a Document struct.

This is useful when you fetch a page directly and want to convert it to a Document.

Parameters

  • page: The page response from the API

Examples

iex> {:ok, page} = ConfluenceLoader.get_page(client, "123", %{body_format: "storage"})
iex> doc = ConfluenceLoader.Pages.page_to_document(page)

stream_space_documents(client, space_key, params \\ %{})

@spec stream_space_documents(
  ConfluenceLoader.Client.t(),
  String.t() | integer(),
  map()
) ::
  Enumerable.t()

Stream documents from a specific space in batches of 4.

This function returns a Stream that yields batches of 4 documents at a time until all documents from the space have been processed. It's memory efficient as it doesn't load all documents into memory at once.

Parameters

  • client: The Confluence client
  • space_key: The key of the space (e.g., "PROJ", "TEAM") or numeric space ID
  • params: Optional parameters for filtering (body_format, etc.)
    • :status - List of page statuses to filter by. Default: ["current"] Valid values: ["current", "archived", "deleted", "trashed"]
    • :body_format - Format for page body (default: "storage")

Examples

# Stream and process documents in batches of 4
client
|> ConfluenceLoader.Pages.stream_space_documents("PROJ")
|> Enum.each(fn batch ->
  IO.puts("Processing batch of #{length(batch)} documents")
  Enum.each(batch, fn doc -> IO.puts("  - #{doc.metadata.title}") end)
end)

# Stream only archived documents
client
|> ConfluenceLoader.Pages.stream_space_documents("PROJ", %{status: ["archived"]})
|> Enum.each(fn batch ->
  # Process each batch of archived documents
  process_archived_batch(batch)
end)

# With async processing using Task.async_stream
client
|> ConfluenceLoader.Pages.stream_space_documents("PROJ")
|> Task.async_stream(fn batch ->
  # Process each batch concurrently
  Enum.map(batch, &process_document/1)
end, max_concurrency: 2)
|> Enum.to_list()