ConfluenceLoader (confluence_loader v0.1.1)

View Source

ConfluenceLoader is an Elixir library for fetching and reading Confluence pages.

It provides a simple interface to interact with Confluence's REST API and convert pages into a format suitable for use with language models.

Installation

Add confluence_loader to your list of dependencies in mix.exs:

def deps do
  [
    {:confluence_loader, "~> 0.1.0"}
  ]
end

Basic Usage

# Create a client
client = ConfluenceLoader.new_client(
  "https://your-domain.atlassian.net",
  "your-email@example.com",
  "your-api-token"
)

# Load all documents
{:ok, documents} = ConfluenceLoader.load_documents(client)

# Load documents from a specific space
{:ok, documents} = ConfluenceLoader.load_space_documents(client, "SPACE_KEY")

# Get a specific page
{:ok, page} = ConfluenceLoader.get_page(client, "123456")

Summary

Functions

Get a specific page by ID.

Get all pages with optional filtering.

Load all pages from Confluence as documents.

Load documents from a specific space created at or after a given timestamp.

Load pages from a specific space as documents.

Stream documents from a specific space in batches of 4.

Functions

get_page(client, page_id, params \\ %{})

@spec get_page(ConfluenceLoader.Client.t(), String.t() | integer(), map()) ::
  {:ok, map()} | {:error, term()}

Get a specific page by ID.

Parameters

  • client: The Confluence client
  • page_id: The ID of the page
  • params: Optional parameters

get_pages(client, params \\ %{})

@spec get_pages(ConfluenceLoader.Client.t(), map()) :: {:ok, map()} | {:error, term()}

Get all pages with optional filtering.

Parameters

  • client: The Confluence client
  • params: Optional filtering parameters

get_pages_for_label(client, label_id, params \\ %{})

@spec get_pages_for_label(ConfluenceLoader.Client.t(), String.t() | integer(), map()) ::
  {:ok, map()} | {:error, term()}

Get pages for a specific label.

Parameters

  • client: The Confluence client
  • label_id: The ID of the label
  • params: Optional filtering parameters

get_pages_in_space(client, space_key, params \\ %{})

@spec get_pages_in_space(ConfluenceLoader.Client.t(), String.t() | integer(), map()) ::
  {:ok, map()} | {:error, term()}

Get pages in a specific space.

Parameters

  • client: The Confluence client
  • space_key: The key of the space (e.g., "PROJ", "TEAM")
  • params: Optional filtering parameters

load_documents(client, params \\ %{})

@spec load_documents(ConfluenceLoader.Client.t(), map()) ::
  {:ok, [ConfluenceLoader.Document.t()]} | {:error, term()}

Load all pages from Confluence as documents.

Parameters

  • client: The Confluence client
  • params: Optional filtering parameters

load_documents_since(client, space_key, since_timestamp, params \\ %{})

@spec load_documents_since(
  ConfluenceLoader.Client.t(),
  String.t() | integer(),
  DateTime.t() | String.t(),
  map()
) :: {:ok, [ConfluenceLoader.Document.t()]} | {:error, term()}

Load documents from a specific space created at or after a given timestamp.

This method filters pages by namespace (space) and creation timestamp, useful for incremental updates or getting only recent content changes.

Parameters

  • client: The Confluence client
  • space_key: The key of the space (e.g., "PROJ", "TEAM")
  • since_timestamp: DateTime struct or ISO 8601 string (e.g., "2024-01-01T00:00:00Z")
  • params: Optional filtering parameters

load_space_documents(client, space_key, params \\ %{})

@spec load_space_documents(ConfluenceLoader.Client.t(), String.t() | integer(), map()) ::
  {:ok, [ConfluenceLoader.Document.t()]} | {:error, term()}

Load pages from a specific space as documents.

Parameters

  • client: The Confluence client
  • space_key: The key of the space (e.g., "PROJ", "TEAM")
  • params: Optional filtering parameters

new_client(base_url, username, api_token, opts \\ [])

@spec new_client(String.t(), String.t(), String.t(), keyword()) ::
  ConfluenceLoader.Client.t()

Creates a new Confluence client.

Parameters

  • base_url: The base URL of your Confluence instance
  • username: Your Atlassian username (email)
  • api_token: Your Atlassian API token
  • opts: Optional configuration (e.g., timeout)

stream_space_documents(client, space_key, params \\ %{})

@spec stream_space_documents(
  ConfluenceLoader.Client.t(),
  String.t() | integer(),
  map()
) ::
  Enumerable.t()

Stream documents from a specific space in batches of 4.

This function returns a Stream that yields batches of 4 documents at a time until all documents from the space have been processed. It's memory efficient as it doesn't load all documents into memory at once.

Parameters

  • client: The Confluence client
  • space_key: The key of the space (e.g., "PROJ", "TEAM") or numeric space ID
  • params: Optional parameters for filtering (body_format, etc.)

Examples

# Stream and process documents in batches of 4
client
|> ConfluenceLoader.stream_space_documents("PROJ")
|> Enum.each(fn batch ->
  IO.puts("Processing batch of #{length(batch)} documents")
  Enum.each(batch, fn doc -> IO.puts("  - #{doc.metadata.title}") end)
end)