Failed jobs are retried with exponential backoff until max_attempts is reached, then discarded.

To defer a job without counting as a failure, return {:sleep, seconds} from perform/1 instead — see Workers.

Triggering a retry

Return {:error, reason} from perform/1:

def perform(%{args: %{"url" => url}}) do
  case HTTPoison.get(url) do
    {:ok, %{status_code: 200}} -> :ok
    {:ok, %{status_code: code}} -> {:error, {:http, code}}
    {:error, reason} -> {:error, reason}
  end
end

Backoff formula

Kathikon.Job.backoff_seconds/1 computes delay before the next attempt:

AttemptBackoff (seconds)
15
220
345
480
min(attempt² × 5, 86400)

After failure, the job moves to :retryable with available_at set to now + backoff. The dispatcher claims it again when available_at has passed.

Kathikon.Job.backoff_seconds(1)  # 5
Kathikon.Job.backoff_seconds(3)  # 45

max_attempts

Default comes from config (20). Override per job:

Kathikon.insert(FlakyWorker, %{}, max_attempts: 3)

Lifecycle:

:executing  {:error, _}  :retryable    :discarded (attempts >= max_attempts)

Error history

Each failure appends to job.errors:

{:ok, job} = Kathikon.fetch(job_id)

job.errors
# [
#   %{
#     "at" => "2026-06-17T12:00:00Z",
#     "attempt" => 1,
#     "reason" => ":timeout"
#   }
# ]

Discarded jobs

When retries are exhausted:

{:ok, job} = Kathikon.fetch(job_id)
job.state  # :discarded

Telemetry emits [:kathikon, :job, :discard]. Discarded jobs are pruned after retention_period — see Configuration.

Exceptions

Raised exceptions are caught and stored as errors:

def perform(_), do: raise("unexpected nil")
# → {:error, {:exception, %RuntimeError{}, stacktrace}}

Orphaned executing jobs (Phase 1 limitation)

If the worker process crashes hard or the node dies while a job is :executing, it stays in that state until Phase 2 lifeline recovery. Plan workers and max_attempts accordingly.

Telemetry

EventWhen
[:kathikon, :job, :retry]Failure with attempts remaining
[:kathikon, :job, :discard]max_attempts exceeded
[:kathikon, :job, :stop]Success (result: :ok in metadata)