gaffer_prometheus (gaffer v0.7.2)

View Source

Optional Prometheus metrics exporter for gaffer.

If you intend to use this hook then you need to include Prometheus as a dependecy in your application.

Setup

The global application:set_env(gaffer, hooks, ...) registration is the recommended configuration for full coverage:

ok = application:ensure_all_started(prometheus),
ok = gaffer_prometheus:start(),
application:set_env(gaffer, hooks, [gaffer_prometheus]).

Note

If hooks are enabled only on some queues, forwarded jobs will only fire events in the respective queues that have hooks configured.

Metrics

Every metric carries a queue label and an actor label. actor identifies which Gaffer process or the public API caused the event and is one of user, worker, runner, or pruner. See gaffer_hooks for the actors emitted by each event.

Counters:

  • gaffer_queues_created_total{queue, actor}

    Queues created via gaffer:create_queue/1.

  • gaffer_queues_updated_total{queue, actor, source}

    Queues updated. source is ensure for gaffer:ensure_queue/1 (fires on every call, even when no fields change) or update for gaffer:update_queue/2.

  • gaffer_queues_paused_total{queue, actor}

    Queues paused via gaffer:pause/1.

  • gaffer_queues_resumed_total{queue, actor}

    Queues resumed via gaffer:resume/1.

  • gaffer_queues_deleted_total{queue, actor}

    Queues deleted via gaffer:delete_queue/1.

  • gaffer_jobs_inserted_total{queue, actor}

    Jobs inserted into a queue (direct user insert or a worker forwarding a terminal-state job to its forward target).

  • gaffer_jobs_claimed_total{queue, actor}

    Jobs picked up by a runner for execution. Incremented by the number of jobs in the claim batch.

  • gaffer_jobs_completed_total{queue, actor}

    Jobs that finished successfully.

  • gaffer_jobs_failed_total{queue, actor}

    Terminal failures only (job exhausted retries).

  • gaffer_jobs_retries_total{queue, actor}

    Retryable failures (job will be retried after backoff).

  • gaffer_jobs_cancelled_total{queue, actor}

    Jobs cancelled before completion, either by user request or by the worker itself.

  • gaffer_jobs_scheduled_total{queue, actor}

    Jobs rescheduled by their worker for a later run.

  • gaffer_jobs_deleted_total{queue, actor}

    Jobs removed from a queue (e.g. via prune or explicit delete).

Histograms:

  • gaffer_job_claim_delay_seconds{queue, actor}

    Wall clock between a job's scheduled_at and its attempted_at, in seconds. Captures how long a job waited before a runner picked it up.

  • gaffer_job_execution_duration_seconds{queue, actor, state}

    Duration of each executed attempt, in seconds, observed with the post-event state. state is one of completed, failed, cancelled, or available (a retryable failure).

  • gaffer_job_attempts{queue, actor, state}

    Final value of the job's attempt field at terminal events. state is completed, failed, or cancelled. User-cancels of jobs that never ran are excluded.

  • gaffer_job_claim_batch_size{queue, actor}

    Number of jobs claimed per claim event. Useful for tuning batch size and spotting starved or overloaded queues.

Cardinality

The queue label is bounded by deployment topology (atoms, small set). actor, state, and source labels are bounded by definition. Programmatically generated queue atoms would explode the series store.

Summary

Functions

Declare every gaffer metric.

Functions

start()

-spec start() -> ok.

Declare every gaffer metric.

Idempotent — safe to call multiple times. Call once at application startup, after application:ensure_all_started(prometheus).