ConfluenceLoader (confluence_loader v0.1.2)
View SourceConfluenceLoader is an Elixir library for fetching and reading Confluence pages.
It provides a simple interface to interact with Confluence's REST API and convert pages into a format suitable for use with language models.
Installation
Add confluence_loader
to your list of dependencies in mix.exs
:
def deps do
[
{:confluence_loader, "~> 0.1.0"}
]
end
Basic Usage
# Create a client
client = ConfluenceLoader.new_client(
"https://your-domain.atlassian.net",
"your-email@example.com",
"your-api-token"
)
# Load all documents
{:ok, documents} = ConfluenceLoader.load_documents(client)
# Load documents from a specific space
{:ok, documents} = ConfluenceLoader.load_space_documents(client, "SPACE_KEY")
# Get a specific page
{:ok, page} = ConfluenceLoader.get_page(client, "123456")
Summary
Functions
Get a specific page by ID.
Get all pages with optional filtering.
Get pages for a specific label.
Get pages in a specific space.
Load all pages from Confluence as documents.
Load documents from a specific space created at or after a given timestamp.
Load pages from a specific space as documents.
Creates a new Confluence client.
Stream documents from a specific space in batches of 4.
Functions
@spec get_page(ConfluenceLoader.Client.t(), String.t() | integer(), map()) :: {:ok, map()} | {:error, term()}
Get a specific page by ID.
Parameters
- client: The Confluence client
- page_id: The ID of the page
- params: Optional parameters
@spec get_pages(ConfluenceLoader.Client.t(), map()) :: {:ok, map()} | {:error, term()}
Get all pages with optional filtering.
Parameters
- client: The Confluence client
- params: Optional filtering parameters
@spec get_pages_for_label(ConfluenceLoader.Client.t(), String.t() | integer(), map()) :: {:ok, map()} | {:error, term()}
Get pages for a specific label.
Parameters
- client: The Confluence client
- label_id: The ID of the label
- params: Optional filtering parameters
@spec get_pages_in_space(ConfluenceLoader.Client.t(), String.t() | integer(), map()) :: {:ok, map()} | {:error, term()}
Get pages in a specific space.
Parameters
- client: The Confluence client
- space_key: The key of the space (e.g., "PROJ", "TEAM")
- params: Optional filtering parameters
@spec load_documents(ConfluenceLoader.Client.t(), map()) :: {:ok, [ConfluenceLoader.Document.t()]} | {:error, term()}
Load all pages from Confluence as documents.
Parameters
- client: The Confluence client
- params: Optional filtering parameters
:status
- List of page statuses to filter by. Default:["current"]
Valid values:["current", "archived", "deleted", "trashed"]
:space_id
- List of space IDs to filter by:limit
- Maximum number of documents to return:body_format
- Format for page body (default: "storage")
Examples
# Load current pages only (default)
{:ok, documents} = ConfluenceLoader.load_documents(client)
# Load archived pages only
{:ok, documents} = ConfluenceLoader.load_documents(client, %{status: ["archived"]})
# Load current and deleted pages
{:ok, documents} = ConfluenceLoader.load_documents(client, %{status: ["current", "deleted"]})
@spec load_documents_since( ConfluenceLoader.Client.t(), String.t() | integer(), DateTime.t() | String.t(), map() ) :: {:ok, [ConfluenceLoader.Document.t()]} | {:error, term()}
Load documents from a specific space created at or after a given timestamp.
This method filters pages by namespace (space) and creation timestamp, useful for incremental updates or getting only recent content changes.
Parameters
- client: The Confluence client
- space_key: The key of the space (e.g., "PROJ", "TEAM")
- since_timestamp: DateTime struct or ISO 8601 string (e.g., "2024-01-01T00:00:00Z")
- params: Optional filtering parameters
:status
- List of page statuses to filter by. Default:["current"]
Valid values:["current", "archived", "deleted", "trashed"]
:limit
- Maximum number of documents to return:body_format
- Format for page body (default: "storage")
Examples
# Load current pages since timestamp (default)
{:ok, documents} = ConfluenceLoader.load_documents_since(client, "PROJ", "2024-01-01T00:00:00Z")
# Load archived and current pages since timestamp
{:ok, documents} = ConfluenceLoader.load_documents_since(client, "PROJ", "2024-01-01T00:00:00Z", %{status: ["current", "archived"]})
@spec load_space_documents(ConfluenceLoader.Client.t(), String.t() | integer(), map()) :: {:ok, [ConfluenceLoader.Document.t()]} | {:error, term()}
Load pages from a specific space as documents.
Parameters
- client: The Confluence client
- space_key: The key of the space (e.g., "PROJ", "TEAM")
- params: Optional filtering parameters
:status
- List of page statuses to filter by. Default:["current"]
Valid values:["current", "archived", "deleted", "trashed"]
:limit
- Maximum number of documents to return:body_format
- Format for page body (default: "storage")
Examples
# Load current pages from space (default)
{:ok, documents} = ConfluenceLoader.load_space_documents(client, "PROJ")
# Load archived pages from space
{:ok, documents} = ConfluenceLoader.load_space_documents(client, "PROJ", %{status: ["archived"]})
# Load current and trashed pages from space
{:ok, documents} = ConfluenceLoader.load_space_documents(client, "PROJ", %{status: ["current", "trashed"]})
@spec new_client(String.t(), String.t(), String.t(), keyword()) :: ConfluenceLoader.Client.t()
Creates a new Confluence client.
Parameters
- base_url: The base URL of your Confluence instance
- username: Your Atlassian username (email)
- api_token: Your Atlassian API token
- opts: Optional configuration (e.g., timeout)
@spec stream_space_documents( ConfluenceLoader.Client.t(), String.t() | integer(), map() ) :: Enumerable.t()
Stream documents from a specific space in batches of 4.
This function returns a Stream that yields batches of 4 documents at a time until all documents from the space have been processed. It's memory efficient as it doesn't load all documents into memory at once.
Parameters
- client: The Confluence client
- space_key: The key of the space (e.g., "PROJ", "TEAM") or numeric space ID
- params: Optional parameters for filtering (body_format, etc.)
:status
- List of page statuses to filter by. Default:["current"]
Valid values:["current", "archived", "deleted", "trashed"]
:body_format
- Format for page body (default: "storage")
Examples
# Stream and process current documents in batches of 4 (default)
client
|> ConfluenceLoader.stream_space_documents("PROJ")
|> Enum.each(fn batch ->
IO.puts("Processing batch of #{length(batch)} documents")
Enum.each(batch, fn doc -> IO.puts(" - #{doc.metadata.title}") end)
end)
# Stream archived documents only
client
|> ConfluenceLoader.stream_space_documents("PROJ", %{status: ["archived"]})
|> Enum.each(fn batch -> process_archived_documents(batch) end)