Oban plugin that keeps workflows healthy.
Add it to your Oban plugins: list like any other plugin:
config :my_app, Oban,
plugins: [
Oban.Plugins.Pruner,
{Baton.Plugin, interval: :timer.seconds(60)}
]On each interval it:
Rescues orphaned waiting jobs. A job stuck in
available/scheduled/retryablewhose dependency's Oban job has been pruned before completing will never resolve on its own. The plugin finds these (viaworkflow_nodesjoined tooban_jobs) and cancels them withOban.cancel_all_jobs/2— a public API, not a manual state write.Emits failure telemetry. For any workflow that has fully terminated with at least one cancelled/discarded step, it emits
[:baton, :workflow, :failed]so you can alert.Backstops the terminal notification. For those same failed workflows it calls
Baton.Completion.announce/2, which broadcasts the one-shot{:workflow_finished, _}event (and[:baton, :workflow, :finished]telemetry) for workflows that settled without a clean worker return — a hard crash or an Oban kill, whereBaton.Workernever got to announce. An atomic claim makes this a no-op for workflows already announced by the worker, so no duplicate event is sent.Prunes Baton's own tables (opt-in). Baton's rows (
workflow_nodes,workflow_step_stats,workflow_debug_logs,workflow_completions) have no foreign key tooban_jobs, so they don't vanish when Oban'sPrunerdeletes the jobs. Withprune: true, the sweep deletes Baton rows once their backing Oban job is gone (seeBaton.Retention). This is off by default.
Options
:interval— sweep interval in milliseconds (default: 60_000):prune— delete orphaned Baton rows on each sweep (default:false):prune_limit— max rows deleted per table per sweep (default: 10_000):debug_log_max_age— optional; also deleteworkflow_debug_logsolder than this many seconds, regardless of job state (they are the largest rows and usually want a shorter retention than the rest)
Example
{Baton.Plugin,
interval: :timer.seconds(60),
prune: true,
debug_log_max_age: :timer.hours(24) |> div(1000)}Telemetry
[:baton, :plugin, :rescued]— measurements%{count}, metadata%{job_ids}[:baton, :plugin, :pruned]— measurements%{count}, metadata is the per-table counts, emitted only whenprune: trueremoved at least one row[:baton, :workflow, :failed]— measurements%{failed_count, total_count}, metadata%{workflow_id, workflow_label}
Summary
Functions
Returns a specification to start this module under a supervisor.
Functions
Returns a specification to start this module under a supervisor.
See Supervisor.