gen_stage v0.1.0 GenStage behaviour

Stages are computation steps that send and/or receive data from other stages.

When a stage sends data, it acts as a producer. When it receives data, it acts as a consumer. Stages may take both producer and consumer roles at once.

Note: the :producer_consumer type referenced below is not yet implemented.

Stage types

Besides taking both producer and consumer roles, a stage may be called “source” if it only produces items or called “sink” if it only consumes items.

For example, imagine the stages below where A sends data to B that sends data to C:

[A] -> [B] -> [C]

we conclude that:

  • A is only a producer (and therefore a source)
  • B is both producer and consumer
  • C is only consumer (and therefore a sink)

As we will see in the upcoming Examples section, we must specify the type of the stage when we implement each of them.

To start the flow of events, we subscribe consumers to producers. Once the communication channel between them is established, consumers will ask the producers for events. We typically say the consumer is sending demand upstream. Once demand arrives, the producer will emit items, never emitting more items than the consumer asked for. This provides a back-pressure mechanism.

A consumer may have multiple producers and a producer may have multiple consumers. When a consumer asks for data, each producer is handled separately, with its own demand. When a producer sends receives demand and sends data to multiple consumers, the demand is tracked and the events are sent by a dispatcher. This allows producers to send data using different “strategies”. See GenStage.Dispatcher for more information.

Example

Let’s define the simple pipeline below:

[A] -> [B] -> [C]

where A is a producer that will emit items starting from 0, B is a producer-consumer that will receive those items and multiply them by a given number and C will receive those events and print them to the terminal.

Let’s start with A. Since A is a producer, its main responsibility is to receive demand and generate events. Those events may be in memory or an external queue system. For simplicity, let’s implement a simple counter starting from a given value of counter received on init/1:

defmodule A do
  use GenStage

  def init(counter) do
    {:producer, counter}
  end

  def handle_demand(demand, counter) when demand > 0 do
    # If the counter is 3 and we ask for 2 items, we will
    # emit the items 3 and 4, and set the state to 5.
    events = Enum.to_list(counter..counter+demand-1)
    {:noreply, events, counter + demand}
  end
end

B is a producer-consumer. This means it does not explicitly handle the demand because the demand is always forwarded to its producer. Once A receives the demand from B, it will send events to B which will be transformed by B as desired. In our case, B will receive events and multiply them by a number giving on initialization and stored as the state:

defmodule B do
  use GenStage

  def init(number) do
    {:producer_consumer, number}
  end

  def handle_events(events, _from, number) do
    events = Enum.map(events, & &1 * number)
    {:noreply, events, number}
  end
end

C will finally receive those events and print them every second to the terminal:

defmodule C do
  use GenStage

  def init(:ok) do
    {:consumer, :the_state_does_not_matter}
  end

  def handle_events(events, _from, state) do
    # Wait for a second.
    :timer.sleep(1000)

    # Inspect the events.
    IO.inspect(events)

    # We are a consumer, so we would never emit items.
    {:noreply, [], state}
  end
end

Now we can start and connect them:

{:ok, a} = GenStage.start_link(A, 0)   # starting from zero
{:ok, b} = GenStage.start_link(B, 2)   # multiply by 2
{:ok, c} = GenStage.start_link(C, :ok) # state does not matter

GenStage.sync_subscribe(c, to: b)
GenStage.sync_subscribe(b, to: a)

After you subscribe all of them, demand will start flowing upstream and events downstream. Because C blocks for one second, the demand will eventually be adjusted to C needs. When implementing consumers, we often set the :max_demand and :min_demand on subscription. The :max_demand specifies the maximum amount of events that must be in flow while the :min_demand specifies the minimum threshold to trigger for more demand. For example, if :max_demand is 100 and :min_demand is 50 (the default values), the consumer will ask for 100 events initially and ask for more only after it receives at least 50.

When such values are applied to the stages above, it is easy to see the producer works in batches. The producer A ends-up emitting batches of 50 items which will take approximately 50 seconds to be consumed by C, which will then request another batch of 50 items.

Buffer events

Due to the concurrent nature of Elixir software, sometimes a producer may receive events without consumers to send those events to. For example, imagine a consumer C subscribes to producer B. Next, the consumer C sends demand to B, which sends the demand upstream. Now, if the consumer C crashes, B may receive the events from upstream but it no longer has a consumer to send those events to. In such cases, B will buffer the events which have arrived from upstream.

The buffer can also be used in cases external sources only send events in batches larger than asked for. For example, if you are receiving events from an external source that only sends events in batches of 100 in 100 and the internal demand is smaller than that.

In all of those cases, if the message cannot be sent immediately, it is stored and sent whenever there is an opportunity to. The size of the buffer is configured via the :buffer_size option returned by init/1. The default value is 1000.

Streams

After exploring the example above, you may be thinking it is a lot of code for something that could be expressed with streams. For example:

Stream.iterate(0, fn i -> i + 1 end)
|> Stream.map(fn i -> i * 2 end)
|> Stream.each(&IO.inspect/1)
|> Stream.run()

The example above would print the same values as our stages with the difference the stream above is not leveraging concurrency. One of the goals of this project is exactly how to explore the interfaces between streams and stages. Meanwhile, it is worth reiterating the advantage of using stages:

  • Stages provide a more structured approach by breaking each stage into a separate module
  • Stages provide all callbacks necessary for process management (init, terminate, etc)
  • Stages can be hot-code upgraded
  • Stages can be supervised individually

Callbacks

GenStage is implemented on top of a GenServer with two additions. Besides exposing all of the GenServer callbacks, it also provides handle_demand/2 to be implemented by producers and handle_events/3 to be implemented by consumers, as shown above. Futhermore, all the callback responses have been modified to potentially emit events. See the callbacks documentation for more information.

By adding use GenStage to your module, Elixir will automatically define all callbacks for you except the following:

  • init/1 - must be implemented to choose between :producer, :consumer or :producer_consumer
  • handle_demand/2 - must be implemented by :producer types
  • handle_events/3 - must be implemented by :producer_consumer and :consumer types

Although this module exposes functions similar to the ones found in the GenServer API, like call/3 and cast/2, developers can also rely directly on GenServer functions such as GenServer.multi_call/4 and GenServer.abcast/3 if they wish to.

Name Registration

GenStage is bound to the same name registration rules as a GenServer. Read more about it in the GenServer docs.

Message-protocol overview

This section will describe the message-protocol implemented by stages. By documenting these messages, we will allow developers to provide their own stage implementations.

Back-pressure

When data is sent between stages, it is done by a message protocol that provides back-pressure. The first step is for the consumer to subscribe to the producer. Each subscription has a unique reference.

Once subscribed, the consumer may ask the producer for messages for the given subscription. The consumer may demand more items whenever it wants to. A consumer must never receive more data than it has asked for from any given producer stage.

A consumer may have multiple producers, where each demand is managed invidually. A producer may have multiple consumers, where the demand and events are managed and delivered according to a GenStage.Dispatcher implementation.

Producer messages

The producer is responsible for sending events to consumers based on demand.

  • {:"$gen_producer", from :: {consumer_pid, subscription_ref}, {:subscribe, options}} - sent by the consumer to the producer to start a new subscription.

    Before sending, the consumer MUST monitor the producer for clean-up purposes in case of crashes. The subscription_ref is unique to identify the subscription (and may be the monitoring reference).

    Once sent, the consumer MAY immediately send demand to the producer. The subscription_ref is unique to identify the subscription.

    Once received, the producer MUST monitor the consumer and call call dispatcher.subscribe(from, state). However, if the subscription reference is known, it must send a :cancel message to the consumer.

  • {:"$gen_producer", from :: {pid, subscription_ref}, {:cancel, reason}} - sent by the consumer to cancel a given subscription.

    Once received, the producer MUST call dispatcher.cancel(from, state) upon receival and discard the subscription. A cancel reply must be sent from the producer to the registered consumer (although there is no guarantee such message can be delivered).

  • {:"$gen_producer", from :: {pid, subscription_ref}, {:ask, count}} - sent by consumers to ask data in a given subscription.

    Once received, the producer MUST call dispatcher.ask(count, from, state) if one is available. The producer MUST send data up to the demand. If the pair is unknown, the producer MUST send an appropriate disconnect reply.

Consumer messages

The consumer is responsible for starting the subscription and sending demand to producers.

  • {:"$gen_consumer", from :: {producer_pid, subscription_ref}, {:cancel, reason}} - sent by producers to cancel a given subscription.

    It is used as a confirmation for client disconnects OR whenever the producer wants to cancel some upstream demand.

  • {:"$gen_consumer", from :: {producer_pid, subscription_ref}, [event]} - events sent by producers to consumers.

    subscription_ref identifies the subscription. The third argument is a non-empty list of events. If the subscription is unknown, the events must be ignored and a cancel message sent to the producer.

Summary

Types

The supported init options

The stage reference

The supported stage types

Functions

Asks the given demand to the producer

Asks the stage to subscribe to the given producer stage asynchronously

Makes a synchronous call to the stage and waits for its reply

Cancels the given subscription on the producer

Sends an asynchronous request to the stage

Replies to a client

Starts a GenStage process without links (outside of a supervision tree)

Starts a GenStage process linked to the current process

Stops the stage with the given reason

Asks the stage to subscribe to the given producer stage synchronously

Callbacks

The same as c:GenServer.code_change/3

The same as c:GenServer.format_status/2

Invoked to handle synchronous call/3 messages. call/3 will block until a reply is received (unless the call times out or nodes are disconnected)

Invoked when a consumer is no longer subscribed to a producer

Invoked to handle asynchronous cast/2 messages

Invoked on :producer stages

Invoked on :producer_consumer and :consumer stages to handle events

Invoked to handle all other messages

Invoked when a consumer subscribes to a producer

Invoked when the server is started

The same as c:GenServer.terminate/2

Types

options :: []

The supported init options

stage ::
  pid |
  atom |
  {:global, term} |
  {:via, module, term} |
  {atom, node}

The stage reference

type :: :producer | :consumer | :producer_consumer

The supported stage types.

Functions

ask(arg, demand)

Asks the given demand to the producer.

This is an asynchronous request typically used by consumers in :manual demand mode.

async_subscribe(stage, opts)

Specs

async_subscribe(stage, opts :: keyword) :: :ok

Asks the stage to subscribe to the given producer stage asynchronously.

This call returns :ok regardless if the subscription effectively happened or not. It is typically called from a stage own’s init/1 callback.

Options

  • :cancel - :permanent (default) or :temporary. When permanent, the consumer exits when the producer cancels or exits. In case of exits, the same reason is used to exit the consumer. In case of cancellations, the reason is wrapped in a :cancel tuple.
  • :min_demand - the minimum demand for this subscription. It overrides the value configured in the consumer initializer
  • :max_demand - the maximum demand for this subscription. It overrides the value configured in the consumer initializer

All other options are sent as is to the producer stage.

Examples

def init(producer) do
  GenStage.async_subscribe(self(), to: producer, min_demand: 10, max_demand: 100)
  {:consumer, []}
end
call(stage, request, timeout \\ 5000)

Specs

call(stage, term, timeout) :: term

Makes a synchronous call to the stage and waits for its reply.

The client sends the given request to the server and waits until a reply arrives or a timeout occurs. handle_call/3 will be called on the stage to handle the request.

stage can be any of the values described in the “Name registration” section of the documentation for this module.

Timeouts

timeout is an integer greater than zero which specifies how many milliseconds to wait for a reply, or the atom :infinity to wait indefinitely. The default value is 5000. If no reply is received within the specified time, the function call fails and the caller exits. If the caller catches the failure and continues running, and the stage is just late with the reply, it may arrive at any time later into the caller’s message queue. The caller must in this case be prepared for this and discard any such garbage messages that are two-element tuples with a reference as the first element.

cancel(arg, reason)

Cancels the given subscription on the producer.

Once the producer receives the request, a confirmation may be forwarded to the consumer (although there is no guarantee as the producer may crash for unrelated reasons before). This is an asynchronous request.

cast(stage, request)

Specs

cast(stage, term) :: :ok

Sends an asynchronous request to the stage.

This function always returns :ok regardless of whether the destination stage (or node) exists. Therefore it is unknown whether the destination stage successfully handled the message.

handle_cast/2 will be called on the stage to handle the request. In case the stage is on a node which is not yet connected to the caller one, the call is going to block until a connection happens.

reply(client, reply)

Specs

reply(GenServer.from, term) :: :ok

Replies to a client.

This function can be used to explicitely send a reply to a client that called call/3 when the reply cannot be specified in the return value of handle_call/3.

client must be the from argument (the second argument) accepted by handle_call/3 callbacks. reply is an arbitrary term which will be given back to the client as the return value of the call.

Note that reply/2 can be called from any process, not just the GenServer that originally received the call (as long as that GenServer communicated the from argument somehow).

This function always returns :ok.

Examples

def handle_call(:reply_in_one_second, from, state) do
  Process.send_after(self(), {:reply, from}, 1_000)
  {:noreply, [], state}
end

def handle_info({:reply, from}, state) do
  GenStage.reply(from, :one_second_has_passed)
end
start(module, args, options \\ [])

Specs

start(module, any, options) :: GenServer.on_start

Starts a GenStage process without links (outside of a supervision tree).

See start_link/3 for more information.

start_link(module, args, options \\ [])

Specs

start_link(module, any, options) :: GenServer.on_start

Starts a GenStage process linked to the current process.

This is often used to start the GenStage as part of a supervision tree.

Once the server is started, the init/1 function of the given module is called with args as its arguments to initialize the stage. To ensure a synchronized start-up procedure, this function does not return until init/1 has returned.

Note that a GenStage started with start_link/3 is linked to the parent process and will exit in case of crashes from the parent. The GenStage will also exit due to the :normal reasons in case it is configured to trap exits in the init/1 callback.

Options

  • :name - used for name registration as described in the “Name registration” section of the module documentation

  • :timeout - if present, the server is allowed to spend the given amount of milliseconds initializing or it will be terminated and the start function will return {:error, :timeout}

  • :debug - if present, the corresponding function in the :sys module is invoked

  • :spawn_opt - if present, its value is passed as options to the underlying process as in Process.spawn/4

Return values

If the server is successfully created and initialized, this function returns {:ok, pid}, where pid is the pid of the server. If a process with the specified server name already exists, this function returns {:error, {:already_started, pid}} with the pid of that process.

If the init/1 callback fails with reason, this function returns {:error, reason}. Otherwise, if it returns {:stop, reason} or :ignore, the process is terminated and this function returns {:error, reason} or :ignore, respectively.

stop(stage, reason \\ :normal, timeout \\ :infinity)

Specs

stop(stage, reason :: term, timeout) :: :ok

Stops the stage with the given reason.

The terminate/2 callback of the given stage will be invoked before exiting. This function returns :ok if the server terminates with the given reason; if it terminates with another reason, the call exits.

This function keeps OTP semantics regarding error reporting. If the reason is any other than :normal, :shutdown or {:shutdown, _}, an error report is logged.

sync_subscribe(stage, opts, timeout \\ 5000)

Specs

sync_subscribe(stage, opts :: keyword, timeout) ::
  {:ok, reference} |
  {:error, :not_a_consumer} |
  {:error, {:bad_opts, String.t}}

Asks the stage to subscribe to the given producer stage synchronously.

This call is synchronous and will return after the called stage sends the subscribe message to the producer. It does not, however, guarantee a subscription: for example, the producer stage may refuse the subscription or exit before or after receiving the message.

This function will return {:ok, ref} as long as the subscription message is sent. It may return {:error, :not_a_consumer} in case the stage is not a consumer.

Options

  • :cancel - :permanent (default) or :temporary. When permanent, the consumer exits when the producer cancels or exits. In case of exits, the same reason is used to exit the consumer. In case of cancellations, the reason is wrapped in a :cancel tuple.
  • :min_demand - the minimum demand for this subscription. It overrides the value configured in the consumer initializer
  • :max_demand - the maximum demand for this subscription. It overrides the value configured in the consumer initializer

All other options are sent as is to the producer stage.

Callbacks

code_change(old_vsn, state, extra)

Specs

code_change(old_vsn, state :: term, extra :: term) ::
  {:ok, new_state :: term} |
  {:error, reason :: term} when old_vsn: term | {:down, term}

The same as c:GenServer.code_change/3.

format_status(arg0, list) (optional)

Specs

format_status(:normal | :terminate, [pdict :: {term, term} | state :: term, ...]) :: status :: term

The same as c:GenServer.format_status/2.

handle_call(request, arg1, state)

Specs

handle_call(request :: term, GenServer.from, state :: term) ::
  {:reply, reply, [event], new_state} |
  {:reply, reply, [event], new_state, :hibernate} |
  {:noreply, [event], new_state} |
  {:noreply, [event], new_state, :hibernate} |
  {:stop, reason, reply, new_state} |
  {:stop, reason, new_state} when reply: term, new_state: term, reason: term, event: term

Invoked to handle synchronous call/3 messages. call/3 will block until a reply is received (unless the call times out or nodes are disconnected).

request is the request message sent by a call/3, from is a 2-tuple containing the caller’s PID and a term that uniquely identifies the call, and state is the current state of the GenStage.

Returning {:reply, reply, [events], new_state} sends the response reply to the caller after events are dispatched (or buffered) and continues the loop with new state new_state. In case you want to deliver the reply before the processing events, use GenStage.reply/2 and return {:noreply, [event], state} (see below).

Returning {:noreply, [event], new_state} does not send a response to the caller and processes the given events before continuing the loop with new state new_state. The response must be sent with reply/2.

Hibernating is also supported as an atom to be returned from either :reply and :noreply tuples.

Returning {:stop, reason, reply, new_state} stops the loop and terminate/2 is called with reason reason and state new_state. Then the reply is sent as the response to call and the process exits with reason reason.

Returning {:stop, reason, new_state} is similar to {:stop, reason, reply, new_state} except a reply is not sent.

If this callback is not implemented, the default implementation by use GenStage will return {:stop, {:bad_call, request}, state}.

handle_cancel(cancel_reason, arg1, state)

Specs

handle_cancel(cancel_reason :: term, GenServer.from, state :: term) ::
  {:noreply, [event], new_state} |
  {:noreply, [event], new_state, :hibernate} |
  {:stop, reason, new_state} when event: term, new_state: term, reason: term

Invoked when a consumer is no longer subscribed to a producer.

It receives the cancellation reason, the from tuple and the state. The cancel_reason will be a {:cancel, _} tuple if the reason for cancellation was a GenStage.cancel/2 call. Any other value means the cancellation reason was due to an EXIT.

If this callback is not implemented, the default implementation by use GenStage will return {:noreply, [], state}.

Return values are the same as c:handle_cast/2.

handle_cast(request, state)

Specs

handle_cast(request :: term, state :: term) ::
  {:noreply, [event], new_state} |
  {:noreply, [event], new_state, :hibernate} |
  {:stop, reason :: term, new_state} when new_state: term, event: term

Invoked to handle asynchronous cast/2 messages.

request is the request message sent by a cast/2 and state is the current state of the GenStage.

Returning {:noreply, [event], new_state} dispatches the events and continues the loop with new state new_state.

Returning {:noreply, [event], new_state, :hibernate} is similar to {:noreply, new_state} except the process is hibernated before continuing the loop.

Returning {:stop, reason, new_state} stops the loop and terminate/2 is called with the reason reason and state new_state. The process exits with reason reason.

If this callback is not implemented, the default implementation by use GenStage will return {:stop, {:bad_cast, request}, state}.

handle_demand(demand, state) (optional)

Specs

handle_demand(demand :: pos_integer, state :: term) ::
  {:noreply, [event], new_state} |
  {:noreply, [event], new_state, :hibernate} |
  {:stop, reason, new_state} when new_state: term, reason: term, event: term

Invoked on :producer stages.

Must always be explicitly implemented by :producer types. It is invoked with the demand from consumers/dispatcher. The producer must either store the demand or return the events requested.

handle_events(list, arg1, state) (optional)

Specs

handle_events([event], GenServer.from, state :: term) ::
  {:noreply, [event], new_state} |
  {:noreply, [event], new_state, :hibernate} |
  {:stop, reason, new_state} when new_state: term, reason: term, event: term

Invoked on :producer_consumer and :consumer stages to handle events.

Must always be explicitly implemented by such types.

Return values are the same as c:handle_cast/2.

handle_info(msg, state)

Specs

handle_info(msg :: term, state :: term) ::
  {:noreply, [event], new_state} |
  {:noreply, [event], new_state, :hibernate} |
  {:stop, reason :: term, new_state} when new_state: term, event: term

Invoked to handle all other messages.

msg is the message and state is the current state of the GenStage. When a timeout occurs the message is :timeout.

If this callback is not implemented, the default implementation by use GenStage will return {:noreply, [], state}.

Return values are the same as c:handle_cast/2.

handle_subscribe(opts, to_or_from, state)

Specs

handle_subscribe(opts :: [options], to_or_from :: GenServer.from, state :: term) ::
  {:automatic | :manual, new_state} |
  {:stop, reason, new_state} when new_state: term, reason: term

Invoked when a consumer subscribes to a producer.

This callback is invoked in both producers and consumers.

For consumers, successful subscriptions must return {:automatic, new_state} or {:manual, state}. The default is to return :automatic, which means the stage implementation will take care of automatically sending demand to producers. :manual must be used when a special behaviour is desired (for example, DynamicSupervisor uses :manual demand) and demand must be sent explicitly with ask/2. The manual subscription must be cancelled when handle_cancel/3 is called.

For producers, successful subscriptions must always return {:automatic, new_state}, the :manual mode is not supported.

If this callback is not implemented, the default implementation by use GenStage will return {:automatic, state}.

init(args)

Specs

init(args :: term) ::
  {type, state} |
  {type, state, options} |
  :ignore |
  {:stop, reason :: any} when state: any

Invoked when the server is started.

start_link/3 (or start/3) will block until it returns. args is the argument term (second argument) passed to start_link/3.

In case of successful start, this callback must return a tuple where the first element is the stage type, which is either a :producer, :consumer or :producer_consumer if it is taking both roles.

For example:

def init(args) do
  {:producer, some_state}
end

The returned tuple may also contain 3 or 4 elements. The third element may be the :hibernate atom or a set of options defined below.

Returning :ignore will cause start_link/3 to return :ignore and the process will exit normally without entering the loop or calling terminate/2.

Returning {:stop, reason} will cause start_link/3 to return {:error, reason} and the process to exit with reason reason without entering the loop or calling terminate/2.

Options

This callback may return options. Some options are specific to the stage type while others are shared across all types.

:producer and :producer_consumer options

  • :buffer_size - the size of the buffer to store events without demand. Check the “Buffer events” section on the module documentation (defaults to 1000)
  • :buffer_keep - returns if the :first or :last (default) entries should be kept on the buffer in case we exceed the buffer size
  • :dispatcher - the dispatcher responsible for handling demands. Defaults to GenStage.DemandDispatch

:consumer and :producer_consumer options

  • :subscribe_to - a list of producers to subscribe to. Each element represents the producer or a tuple with the producer and the subscription options
terminate(reason, state)

Specs

terminate(reason, state :: term) :: term when reason: :normal | :shutdown | {:shutdown, term} | term

The same as c:GenServer.terminate/2.