procrastinator v0.1.0 Procrastinator behaviour

A behavior module for batching/procrastinating work.

One procrastinator is defined per app. In the future this may change, but the original use was too aggregate results coming from multiple processes, so the underlying GenServer is assigned a name through the name/0 callback, and so there will exist one instance per defined module that uses Procrastinator.

The Procrastinator has a bucket, to which data can be added using the push/1 method. If this causes the bucket to be full, process/1 is called on the bucket. If this causes the bucket to overflow (eg, if the size of the bucket is measured in bytes), the current bucket will be processed and the data that was pushed will start the new bucket. If the bucket is not full, it won’t do anything until timeout/0 is reached, at which point it will process whatever is in the bucket.

Example

Assume you have thousands of processes all doing some work, and when each one is done you need to save the data to a third party service. This third party service can’t handle saving all of that work one at a time, but it also can’t handle saving one giant batch full of thousands of results. So we define a Procrastinator to batch the result sets into reasonable sizes.

defmodule SaveToThirdParty do
  use Procrastinator

  @max_items 20
  @max_bytes 1024 * 256

  def process(bucket) do
    ThirdPartyApi.save_batch(bucket)
  end

  def timeout, do: 60_000

  def name, do: :save_to_third_party

  def status(bucket) do
    case length(bucket) do
      bucket_length when bucket_length == @max_items -> :full
      bucket_length when bucket_length > @max_items -> :overflow
      _ -> check_size(bucket)
    end
  end

  defp check_size(bucket) do
    case byte_size(Poison.encode!(bucket)) > @max_bytes do
      false -> :continue
      true when length(data) == 1 -> :full
      _ -> :overflow
    end
  end

SaveToThirdParty.start_link
SaveToThirdParty.push(1)

In this scenario, we the processes can be slow, so we want to wait a minute before sending a batch to give it a chance to fill up all the way. The volume of processes ensures that the timeout won’t be reached until most of them have already finished and just the stragglers are running.

The maximum number of items the api can handle at a time is 20. On top of this, the maximum payload size we want to send over the wire is 256kb. To handle this, we use the status/1 callback. If the length of data (the bucket) equals 20, we send back :full. If it is over 20, we send :overflow. In this example that can’t happen, but it’s here for completeness. Otherwise we check the size in bytes of the bucket, and follow the same logic: if the byte size is less than 256kb, we return :continue. If it’s equal to it we return :full, and otherwise we return :overflow.

This all ensures that the third party service never receives a batch bigger than in can handle. It also ensures that most of the time it will receive a batch exactly equal to what it can handle.

Starting

The Procrastinator given in the example can be started using the start_link function. It can also be added as a worker to a supervision tree, for example:

children = [
  supervisor(YourApp.Endpoint, []),
  worker(YourApp.SaveToThirdParty, [])
]

opts = [strategy: :one_for_one, name: YourApp.Supervisor]
Supervisor.start_link(children, opts)

Summary

Callbacks

The name to register the procrastinator to

Invoked when status of bucket is :overflow or :full. In the case that the status is :full, the entire bucket will be passed in, in the case of an :overflow, the entire bucket without the most recently pushed data will be used. Once data is given to process/1 it is no longer in the bucket, there are no mechanisms to recover that data if it is lost in process/1

Invoked when attempting to push data into the bucket. It will be given the current bucket with the new data prepended to it. Depending on the state of that bucket, it should return :overflow, :full, or :continue

Returns the timeout of the Procrastinator. This determines how long it will wait since last receiving data before it processes it. This timeout resets every time data is received

Callbacks

name()
name :: atom

The name to register the procrastinator to.

process(bucket)
process(bucket :: [any]) :: any

Invoked when status of bucket is :overflow or :full. In the case that the status is :full, the entire bucket will be passed in, in the case of an :overflow, the entire bucket without the most recently pushed data will be used. Once data is given to process/1 it is no longer in the bucket, there are no mechanisms to recover that data if it is lost in process/1.

Args

  • bucket - a list containing data sets passed to push/1
status(bucket)
status(bucket :: [any]) :: :overflow | :full | :continue

Invoked when attempting to push data into the bucket. It will be given the current bucket with the new data prepended to it. Depending on the state of that bucket, it should return :overflow, :full, or :continue.

Args

  • bucket - list containing all the data in the current bucket with the data that is trying to be inserted at the head.

Returns

  • :continue - the Procrastinator will continue Procrastinating until timeout/0 is reached.
  • :full - process/1 will be called with the bucket that was passed to status/1, and the Procrastinator will be given a new, empty bucket.
  • :overflow - process/1 will be called with the Procrastinator’s current bucket, and will be given a new bucket containing the new data.
timeout()
timeout :: integer

Returns the timeout of the Procrastinator. This determines how long it will wait since last receiving data before it processes it. This timeout resets every time data is received.