gaffer_prometheus (gaffer v0.7.1)
View SourceOptional Prometheus metrics exporter for gaffer.
If you intend to use this hook then you need to include Prometheus as a dependecy in your application.
Setup
The global application:set_env(gaffer, hooks, ...) registration is the
recommended configuration for full coverage:
ok = application:ensure_all_started(prometheus),
ok = gaffer_prometheus:start(),
application:set_env(gaffer, hooks, [gaffer_prometheus]).Note
If hooks are enabled only on some queues, forwarded jobs will only fire events in the respective queues that have hooks configured.
Metrics
Every metric carries a queue label and an actor label. actor identifies
which Gaffer process or the public API caused the event and is one of user,
worker, runner, or pruner. See gaffer_hooks for the actors emitted by
each event.
Counters:
gaffer_queues_created_total{queue, actor}Queues created via
gaffer:create_queue/1.gaffer_queues_updated_total{queue, actor, source}Queues updated.
sourceisensureforgaffer:ensure_queue/1(fires on every call, even when no fields change) orupdateforgaffer:update_queue/2.gaffer_queues_paused_total{queue, actor}Queues paused via
gaffer:pause/1.gaffer_queues_resumed_total{queue, actor}Queues resumed via
gaffer:resume/1.gaffer_queues_deleted_total{queue, actor}Queues deleted via
gaffer:delete_queue/1.gaffer_jobs_inserted_total{queue, actor}Jobs inserted into a queue (direct user insert or a
workerforwarding a terminal-state job to itsforwardtarget).gaffer_jobs_claimed_total{queue, actor}Jobs picked up by a runner for execution. Incremented by the number of jobs in the claim batch.
gaffer_jobs_completed_total{queue, actor}Jobs that finished successfully.
gaffer_jobs_failed_total{queue, actor}Terminal failures only (job exhausted retries).
gaffer_jobs_retries_total{queue, actor}Retryable failures (job will be retried after backoff).
gaffer_jobs_cancelled_total{queue, actor}Jobs cancelled before completion, either by user request or by the worker itself.
gaffer_jobs_scheduled_total{queue, actor}Jobs rescheduled by their worker for a later run.
gaffer_jobs_deleted_total{queue, actor}Jobs removed from a queue (e.g. via prune or explicit delete).
Histograms:
gaffer_job_claim_delay_seconds{queue, actor}Wall clock between a job's
scheduled_atand itsattempted_at, in seconds. Captures how long a job waited before a runner picked it up.gaffer_job_execution_duration_seconds{queue, actor, state}Duration of each executed attempt, in seconds, observed with the post-event
state.stateis one ofcompleted,failed,cancelled, oravailable(a retryable failure).gaffer_job_attempts{queue, actor, state}Final value of the job's
attemptfield at terminal events.stateiscompleted,failed, orcancelled. User-cancels of jobs that never ran are excluded.gaffer_job_claim_batch_size{queue, actor}Number of jobs claimed per claim event. Useful for tuning batch size and spotting starved or overloaded queues.
Cardinality
The queue label is bounded by deployment topology (atoms, small set).
actor, state, and source labels are bounded by definition.
Programmatically generated queue atoms would explode the series store.