Writing reporters

Reporters are a crucial part of Telemetry.Metrics "ecosystem" - without them, metric definitions are merely.. definitions. This guide aims to help in writing the reporter in a proper way.

Before writing the reporter for your favourite monitoring system, make sure that one isn't already available on Hex.pm - it might make sense to contribute and improve the existing solution than starting from scratch.

Let's get started!

Responsibilites

The reporter has four main responsibilities:

  • it needs to accept a list of metric definitions as input when being started
  • it needs to attach handlers to events contained in these definitions
  • when the events are emitted, it needs to extract the measurement and selected tags, and handle them in a way that makes sense for whathever it chooses to publish to
  • it needs to detach event handlers when it stops or crashes

Accepting metric definitions as input

This one is quite easy - you need to give your users a way to actually tell you what metrics they want to track. It's essential to give users an option to provide metric definitions at runtime (e.g. when their application starts). For example, let's say you're building a PigeonReporter. If the reporter was process-based, you could provide a start_link/1 function that accepts a list of metric definitions:

  metrics = [
    counter("..."),
    last_value("..."),
    distribution("...")
  ]

  PigeonReporter.start_link(metrics)

If the reporter doesn't support metrics of particular type, it log a warning or return an error.

Attaching event handlers

Event handlers are attached using :telemetry.attach/4 function. To reduce overhead of installing many event handlers, you can install a single handler for multiple metrics based on the same event.

Note that handler IDs need to be unique - you can generate completely random blobs of data, or use something that you know needs to be unique anyway, e.g. some combination of reporter name, event name, and something which is different for multiple instances of the same reporter (PID is a good choice if the reporter is process-based):

id = {PigeonReporter, metric.event_name, self()}

Assuming that metrics is a list of metric definitions based on event, we can attach a handler like this:

:telemetry.attach(id, event, &PigeonReporter.handle_event/4, %{metrics: metrics})

Reacting to events

There are two parts to event handling - the first one is extracting event measurements and tags, which is the same for all reporters, and the second one is performing logic specific to particular reporter.

Let's implement the basic event handler attached in the previous section:

def handle_event(_event_name, measurements, metadata, %{metrics: metrics}) do
  for metric <- metrics do
    measurement = extract_measurement(metric, measurements)
    tags = extract_tags(metric, metadata)
    # everything else is specific to particular reporter
  end
end

As described before, first we extract the measurement and tags, and later perform reporter-specific logic. The implementation of extract_measurement/2 might look as follows:

def extract_measurement(metric, measurements) do
  case metric.measurement do
    fun when is_function(fun, 1) ->
      fun.(measurements)
    key ->
      measurements[key]
  end
end

Since :measurement in the metric definition can be both arbitrary term (to be used as key to fetch the measurement) or a function, we need to handle both cases.

Note: Telemetry.Metrics can't guarantee that the extracted measurement's value is a number. Each reporter can handle this scenario properly, either by logging a warning, detaching the handler etc.

We also need to implement the extract_tags/2 function:

def extract_tags(metric, metadata) do
  tag_values = metric.tag_values.(metadata)
  for tag <- tags, into: %{} do
    case Map.fetch(tag_values, tag) do
      {:ok, value} ->
        Map.put(tags, tag, value)
      :error ->
        Logger.warn("Tag #{inspect(tag)} not found in event metadata: #{inspect(metadata)}")
        Map.put(tags, tag, nil)
    end
  end
end

First we need to apply last-minute transformation to the metadata using the :tag_values function. After that, we loop through the list of desired tags and fetch them from transformed metadata - if the particular key is not present in the metadata, we log a warning and assign nil as the tag value.

It is very important that the code executed on every event does not fail, as that would cause the handler to be permanently removed and prevent the metrics from being updated.

Detaching the handlers on termination

To leave the system in a clean state, the reporter should detach the event handlers it installed when it's being stopped or terminated unexpectedely. This can be done by trapping exists and implementing the terminate callback, or having a dedicated process responsible only for the cleanup (e.g. by using monitors).

Documentation

It's extremely important that reporters document how Telemetry.Metrics metric types, names, and tags are translated to metric types and identifiers in the system they publish metrics to. They should also document if some metric types are not supported at all.

Examples

To our knowledge, there are not many reporters in the wild yet. TelemetryMetricsStatsd is a reporter which might serve as an example when implementing your own.