Baton.Plugin (Baton v0.1.0)

Copy Markdown View Source

Oban plugin that keeps workflows healthy.

Add it to your Oban plugins: list like any other plugin:

config :my_app, Oban,
  plugins: [
    Oban.Plugins.Pruner,
    {Baton.Plugin, interval: :timer.seconds(60)}
  ]

On each interval it:

  1. Rescues orphaned waiting jobs. A job stuck in available/scheduled/ retryable whose dependency's Oban job has been pruned before completing will never resolve on its own. The plugin finds these (via workflow_nodes joined to oban_jobs) and cancels them with Oban.cancel_all_jobs/2 — a public API, not a manual state write.

  2. Emits failure telemetry. For any workflow that has fully terminated with at least one cancelled/discarded step, it emits [:baton, :workflow, :failed] so you can alert.

  3. Backstops the terminal notification. For those same failed workflows it calls Baton.Completion.announce/2, which broadcasts the one-shot {:workflow_finished, _} event (and [:baton, :workflow, :finished] telemetry) for workflows that settled without a clean worker return — a hard crash or an Oban kill, where Baton.Worker never got to announce. An atomic claim makes this a no-op for workflows already announced by the worker, so no duplicate event is sent.

  4. Prunes Baton's own tables (opt-in). Baton's rows (workflow_nodes, workflow_step_stats, workflow_debug_logs, workflow_completions) have no foreign key to oban_jobs, so they don't vanish when Oban's Pruner deletes the jobs. With prune: true, the sweep deletes Baton rows once their backing Oban job is gone (see Baton.Retention). This is off by default.

Options

  • :interval — sweep interval in milliseconds (default: 60_000)
  • :prune — delete orphaned Baton rows on each sweep (default: false)
  • :prune_limit — max rows deleted per table per sweep (default: 10_000)
  • :debug_log_max_age — optional; also delete workflow_debug_logs older than this many seconds, regardless of job state (they are the largest rows and usually want a shorter retention than the rest)

Example

{Baton.Plugin,
 interval: :timer.seconds(60),
 prune: true,
 debug_log_max_age: :timer.hours(24) |> div(1000)}

Telemetry

  • [:baton, :plugin, :rescued] — measurements %{count}, metadata %{job_ids}
  • [:baton, :plugin, :pruned] — measurements %{count}, metadata is the per-table counts, emitted only when prune: true removed at least one row
  • [:baton, :workflow, :failed] — measurements %{failed_count, total_count}, metadata %{workflow_id, workflow_label}

Summary

Functions

Returns a specification to start this module under a supervisor.

Functions

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.