View Source Pacer.Workflow (Pacer v0.1.1)

Dependency Graph-Based Workflows With Robust Compile Time Safety & Guarantees

Motivations

Pacer.Workflow is designed for complex workflows where many interdependent data points need to be stitched together to provide a final result, specifically workflows where each data point needs to be loaded and/or calculated using discrete, application-specific logic.

To create a struct backed by Pacer.Workflow, invoke use Pacer.Workflow at the top of your module and use the graph/1 macro, which is explained in more detail in the docs below.

Note that when using Pacer.Workflow, you can pass the following options:

  • :generate_docs? (boolean/0) - By invoking use Pacer.Workflow, Pacer will automatically generate module documentation for you. It will create a section titled Pacer Fields in your moduledoc, either by creating a moduledoc for you dynamically or appending this section to any existing module documentation you have already provided.

    To opt out of this feature, when you use Pacer.Workflow, set this option to false.

    The default value is true.

The following is a list of the main ideas and themes underlying Pacer.Workflow

1. Workflows Are Dependency Graphs

Pacer.Workflows are backed by dependency graphs (specifically represented as directed acyclic graphs) that are constructed at compile-time. Your Workflows will define a set of data points, represented as fields (see below); each field must explicitly define the dependencies it has on other fields in the Workflow. For example, if we have a workflow where we load a set of users and then fire off some requests to a 3rd party service to fetch some advertisements for those users, our Workflow might look something like this:

defmodule UserAdsWorkflow do
  use Pacer.Workflow

  graph do
    field(:users)
    field(:user_advertisements, resolver: &Ads.fetch_user_ads/1, dependencies: [:users])
  end
end

Why is the dependency graph idea important here?

In the above, simplified example with only two fields, there may not be a need to define a dependency graph because we can look at the two fields and immediately realize that we first need to have the set of users before we can make the call to load :user_advertisements.

However, in complex workflows with dozens or even hundreds of data points, if we were to try to manage what data points need to be loaded in which order manually, it would be a daunting and time-consuming task. Not only that, we would also run the risk of getting the ordering wrong, AND/OR when new data points are added or removed in the future, that we would need to manually rearrange things for each data point to be loaded in the correct order.

The result is untenable for workflows of sufficient size.

This is where dependency graphs come in to play. By forcing you to explicitly declare the other fields that you depend on in the workflow, Pacer.Workflow can build out a dependency graph and figure out how to schedule the execution of each of your resolver functions (see below for more details on resolvers) so that each function will only be called when its dependencies are ready. That eliminates the need to manually rearrange calls in your codebase, and also allows you to have discrete, single-purpose resolvers that can be rigorously unit-tested against a known, constrained set of inputs.

2. Batched, Parallel Requests to Disparate External Systems (3rd-party APIs, Database Calls, etc.)

Pacer.Workflows also allow users to fire off potentially high-latency calls in parallel to reduce overall latency of a Workflow. To do so, we can use the batch/2 macro inside of your graph definition. One caveat to this, however, is that fields inside of a batch definition must not have any dependencies on other fields inside the same batch.

Batches are nice to use when a workflow has multiple high-latency requests that need to be made. Batching the requests together, when possible, will fire off the requests in parallel. The requests can be to disparate, unrelated services, APIs, and external systems including databases and/or caches.

Note: batches should not be confused with batch loading data in the sense that, for example, GraphQL batches are used where users may provide a set of ids, etc., for related entities and the batch processing loads all of or as many of those entities in a single request rather than making a single request per entity. Pacer.Workflow batches can be used to do so in roughly the same way, but that choice is left up to the user and the implementation. The key idea of a batch here is that you have multiple (potentially) high-latency requests that you want to execute together (in parallel), rather than saying "I have a set of entities that I want to load as a batch request".

For example, if we go back to the earlier example of a user-based workflow where we load a set of users and fetch advertisements for those users, if we add in another request to, say, an analytics service to get some more data on the set of users we have just loaded, we can do that in a batch as follows:

defmodule UserAdsWorkflow do
  use Pacer.Workflow

  graph do
    field(:users)
    batch :requests do
      field(:user_advertisements, resolver: &Ads.fetch_user_ads/1, dependencies: [:users])
      field(:analytics, resolver: &Analytics.analyze_users/1, dependencies: [:users])
    end
  end
end

Now, rather than those two requests being fired sequentially (and thereby boosting the latency of the workflow to be equal to the latency of the ads request plus the latency of the analytics request, the latency will instead be capped at the slowest of the two requests).

3. Compile-Time Safety And Guarantees

The third motivating factor behind Pacer.Workflow is to provide a robust set of compile-time safety mechanisms. These include:

  • Detecting and preventing cyclical dependencies in the dependency graph defined by the workflow
  • Preventing "reflexive" dependencies, where a field depends on itself
  • Detecting invalid options on fields and batches
  • Preventing a single module from defining more than one Workflow
  • Detecting duplicate field definitions in a graph
  • Ensuring that resolver definitions fit the contract required by Pacer.Workflow (a 1-arity function that takes a map)
  • Detecting dependencies on fields that do not exist in the graph definition
  • Requiring fields defined inside of a batch to have a resolver function defined

Pacer.Workflow strives to provide helpful error messages to the user at compile time when it detects any issues and tries to direct the user on what went wrong, why, and how to fix the issue.

The compile-time safety can prevent a whole class of issues at runtime, and also allows the dependency graph to be computed once at compile time. Building the dependency graph at compile time allows Pacer to cache the results of the graph and make those results accessible at runtime so your application does not have to incur the cost of building out the dependency graph at runtime.

Summary

Pacer.Workflow provides the ability to explicitly declare a dependency graph, where the nodes in the graph map to fields in a struct defined via the graph/1 API.

The key idea behind Pacer.Workflow is that it enables users to create Elixir structs that serve as containers of loosely-related fields, where the fields in the struct have dependencies between other fields in the struct.

A "dependency", in the sense it is used here, means that one field relies on another field's value being readily available and loaded in memory before its own value can be computed or loaded. For example, if you have a struct %MyStruct{field_a: 1, field_b: <field_a's value + 1>}, :field_b is dependent on :field_a's value already being present before it can be calculated.

The example given above can be solved in a more straightforward way, by having a simple function to build out the entire struct given :field_a's value as input, i.e.:

def build_my_struct(field_a_input) do
  %MyStruct{field_a: field_a_input, field_b: field_a_input + 1}
end

While conceptually simple, this pattern becomes more difficult to maintain when additional fields are added with dependencies between each other.

Pacer.Workflow addresses this problem by forcing users to explicitly declare the dependencies between fields up front, at compile time. Once the fields and dependencies have been declared, Pacer.Workflow can build a dependency graph, which allows the graph to solve for the problem of dependency resolution by answering the question: Which fields need to be available when and in what order do they need to be executed?

There are a few key concepts to know in order to build out a Pacer.Workflow-backed struct:

Fields

A field can be defined within a graph definition with the field/2 macro. A field maps one-to-one to keys on the struct generated by the graph definition. Fields are how you explicitly declare the dependencies each field has on other fields within the same graph. You do this by providing a list of dependencies as atoms to the field/2 macro:

  graph do
    field(:field_one)
    field(:field_two)
    field(:my_dependent_field, resolver: &MyResolver.resolve/1 dependencies: [:field_one, :field_two])
  end

If the :dependencies option is not given, it defaults to an empty list and effectively means that the field has no dependencies. This may be the case when the value for the field meets one of the following conditions:

  • The value is a constant
  • The value is already available and accessible in memory when creating the struct

Fields that do explicitly declare at least one dependency MUST also pass in a :resolver option. See the Resolvers section below for more details.

Additionally, fields may declare a default value by passing a default to the :default option key:

  graph do
    field(:my_field, default: 42)
  end

Resolvers

Resolvers are 1-arity functions that take in the values from dependencies as input and return the value that should be placed on the struct key for the associated field. Resolvers are function definitions that Pacer.Workflow can use to incrementally compute all values needed.

For example, for a graph definition that looks like this:

  defmodule MyGraph do
    use Pacer.Workflow

    graph do
      field(:field_one)
      field(:dependent_field, resolver: &__MODULE__.resolve/1, dependencies: [:field_one])
    end

    def resolve(inputs) do
      IO.inspect(inputs.field_one_value, label: "Received field_one's value")
    end
  end

Resolver functions will always be called with a map that contains the values for fields declared as dependencies. In the above example, that means if we have a struct %MyGraph{field_one: 42}, the resolver will be invoked with %{field_one: 42}.

Keep in mind that if you declare any dependencies, you MUST also declare a resolver.

Batches

Batches can be defined using the batch/3 macro.

Batches allow users to group together a set of fields whose resolvers can and should be run in parallel. The main use-case for batches is to reduce running time for fields whose resolvers can have high-latencies. This generally means that batches are useful to group together calls that hit the network in some way.

Batches do impose some more restrictive constraints on users, however. For example, all fields defined within a batch MUST NOT declare dependencies on any other field in the same batch. This is because the resolvers will run concurrently with one another, so there is no way to guarantee that a field within the same batch will have a value ready to use and pass to a separate resolver in the same batch. In scenarios where you find this happening, Pacer.Workflow will raise a compile time error and you will need to rearrange your batches, possibly creating two separate batches or forcing one field in the batch to run sequentially as a regular field outside of a batch block.

Batches must also declare a name and fields within a batch must define a resolver. Batch names must also be unique within a single graph definition. Resolvers are required for fields within a batch regardless of whether or not the field has any dependencies.

Ex.:

  defmodule MyGraphWithBatches do
    use Pacer.Workflow

    graph do
      field(:regular_field)

      batch :http_requests do
        field(:request_one, resolver: &__MODULE__.resolve/1)
        field(:request_two, resolver: &__MODULE__.resolve/1, dependencies: [:regular_field])
      end

      field(:another_field, resolver: &__MODULE__.simple_resolver/1, dependencies: [:request_two])
    end

    def resolve(_) do
      IO.puts("Simulating HTTP request")
    end

    def simple_resolver(_), do: :ok
  end

Notes:

The order fields are defined in within a graph definition does not matter. For example, if you have a field :request_one that depends on another field :request_two, the fields can be declared in any order.

Summary

Functions

The batch/3 macro is to be invoked when grouping fields with resolvers that will run in parallel.

Takes a struct that has been defined via the Pacer.Workflow.graph/1 macro. execute will run/execute all of the resolvers defined in the definition of the graph macro in an order that ensures all dependencies have been met before the resolver runs.

The field/2 macro maps fields one-to-one to keys on the struct created via the graph definition.

A Depth-First Search to find where is the dependency graph cycle and then display the cyclic dependencies back to the developer.

The graph/1 macro is the main entrypoint into Pacer.Workflow to create a dependency graph struct. use the Pacer.Workflow macro at the top of your module and proceed to define your fields and/or batches.

Functions

Link to this macro

batch(name, options \\ [timeout: 1000], list)

View Source (macro)

The batch/3 macro is to be invoked when grouping fields with resolvers that will run in parallel.

Reminder:

  • The batch must be named and unique.
  • The fields within the batch must not have dependencies on one another since they will run concurrently.
  • The fields within the batch must each declare a resolver function.

NOTE: In general, only batch fields whose resolvers contain potentially high-latency operations, such as network calls.

Example

  defmodule MyValidGraph do
    use Pacer.Workflow

    graph do
      field(:custom_field)

      batch :http_requests do
        field(:request_1, resolver: &__MODULE__.do_work/1, dependencies: [:custom_field])
        field(:request_2, resolver: &__MODULE__.do_work/1, dependencies: [:custom_field])
        field(:request_3, resolver: &__MODULE__.do_work/1)
      end
    end

    def do_work(_), do: :ok
  end

Field options for fields defined within a batch have one minor requirement difference from fields not defined within a batch: batched fields MUST always define a resolver function, regardless of whether or not they define any dependencies.

Batch Field Options

  • :guard (function of arity 1) - A guard is a 1-arity function that takes in a map with the field's dependencies and returns either true or false. If the function returns false, it means there is no work to do and thus no reason to spin up another process to run the resolver function. In this case, the field's default value is returned. If the function returns true, the field's resolver will run in a separate process.

  • :dependencies (list of atom/0) - A list of dependencies from the graph. Dependencies are specified as atoms, and each dependency must be another field in the same graph.

    Remember that cyclical dependencies are strictly not allowed, so fields cannot declare dependencies on themselves nor on any other field that has already declared a dependency on the current field.

    If the dependencies option is not given, it defaults to an empty list, indicating that the field has no dependencies. This will be the case if the field is a constant or can be constructed from values already available in the environment.

    The default value is [].

  • :doc (String.t/0) - Allows users to document the field and provide background and/or context on what the field is intended to be used for, what kind of data the field contains, and how the data for the field is constructed.

  • :resolver (function of arity 1) - Required. A resolver is a 1-arity function that specifies how to calculate the value for a field.

    The argument passed to the function will be a map that contains all of the field's declared dependencies.

    For example, if we have a field like this:

    field(:request, resolver: &RequestHandler.resolve/1, dependencies: [:api_key, :url])

    The resolver RequestHandler.resolve/1 would be passed a map that looks like this:

    %{api_key: "<API KEY GOES HERE>", url: "https://some.endpoint.com"}

    If the field has no dependencies, the resolver will receive an empty map. Note though that resolvers are only required for fields with no dependencies if the field is inside of a batch. If your field has no dependencies and is not inside a batch, you can skip defining a resolver and initialize your graph struct with a value that is either constant or readily available where you are constructing the struct.

    The result of the resolver will be placed on the graph struct under the field's key.

    For the above, assuming a graph that looks like this:

    defmodule MyGraph do
      use Pacer.Workflow
    
      graph do
        field(:api_key)
        field(:url)
        field(:request, resolver: &RequestHandler.resolve/1, dependencies: [:api_key, :url])
      end
    end

    Then when the RequestHandler.resolve/1 runs an returns a value of, let's say, %{response: "important response"}, your graph struct would look like this:

    %MyGraph{
      api_key: "<API KEY GOES HERE>",
      url: "https://some.endpoint.com",
      request: %{response: "important response"}
    }
  • :default (term/0) - Required. The default value for the field. If no default is given, the default value becomes #Pacer.Workflow.FieldNotSet<>.

  • :virtual? (boolean/0) - A virtual field is used for intermediate or transient computation steps during the workflow and becomes a node in the workflow's graph, but does not get returned in the results of the workflow execution.

    In other words, virtual keys will not be included in the map returned by calling Pacer.Workflow.execute/1.

    The intent of a virtual field is to allow a spot for intermediate and/or transient calculation steps but to avoid the extra memory overhead that would be associated with carrying these values downstream if, for example, the map returned from Pacer.Workflow.execute/1 is stored in a long-lived process state; intermediate or transient values can cause unnecessary memory bloat if they are carried into process state where they are not neeeded.

    The default value is false.

Batch options

  • :on_timeout (atom/0) - Required. The task that is timed out is killed and returns {:exit, :timeout}. This :kill_task option only exits the task process that fails and not the process that spawned the task. The default value is :kill_task.

  • :timeout (non_neg_integer/0) - Required. The time in milliseconds that the batch is allowed to run for. Defaults to 1,000 (1 second). The default value is 1000.

@spec execute(struct() | module()) :: struct()

Takes a struct that has been defined via the Pacer.Workflow.graph/1 macro. execute will run/execute all of the resolvers defined in the definition of the graph macro in an order that ensures all dependencies have been met before the resolver runs.

Resolvers that have been defined within batches will be executed in parallel.

Link to this macro

field(name, options \\ [])

View Source (macro)

The field/2 macro maps fields one-to-one to keys on the struct created via the graph definition.

Fields must be unique within a graph instance.

Options:

There are specific options that are allowed to be passed in to the field macro, as indicated below:

  • :dependencies (list of atom/0) - A list of dependencies from the graph. Dependencies are specified as atoms, and each dependency must be another field in the same graph.

    Remember that cyclical dependencies are strictly not allowed, so fields cannot declare dependencies on themselves nor on any other field that has already declared a dependency on the current field.

    If the dependencies option is not given, it defaults to an empty list, indicating that the field has no dependencies. This will be the case if the field is a constant or can be constructed from values already available in the environment.

    The default value is [].

  • :doc (String.t/0) - Allows users to document the field and provide background and/or context on what the field is intended to be used for, what kind of data the field contains, and how the data for the field is constructed.

  • :resolver (function of arity 1) - A resolver is a 1-arity function that specifies how to calculate the value for a field.

    The argument passed to the function will be a map that contains all of the field's declared dependencies.

    For example, if we have a field like this:

    field(:request, resolver: &RequestHandler.resolve/1, dependencies: [:api_key, :url])

    The resolver RequestHandler.resolve/1 would be passed a map that looks like this:

    %{api_key: "<API KEY GOES HERE>", url: "https://some.endpoint.com"}

    If the field has no dependencies, the resolver will receive an empty map. Note though that resolvers are only required for fields with no dependencies if the field is inside of a batch. If your field has no dependencies and is not inside a batch, you can skip defining a resolver and initialize your graph struct with a value that is either constant or readily available where you are constructing the struct.

    The result of the resolver will be placed on the graph struct under the field's key.

    For the above, assuming a graph that looks like this:

    defmodule MyGraph do
      use Pacer.Workflow
    
      graph do
        field(:api_key)
        field(:url)
        field(:request, resolver: &RequestHandler.resolve/1, dependencies: [:api_key, :url])
      end
    end

    Then when the RequestHandler.resolve/1 runs an returns a value of, let's say, %{response: "important response"}, your graph struct would look like this:

    %MyGraph{
      api_key: "<API KEY GOES HERE>",
      url: "https://some.endpoint.com",
      request: %{response: "important response"}
    }
  • :default (term/0) - The default value for the field. If no default is given, the default value becomes #Pacer.Workflow.FieldNotSet<>.

  • :virtual? (boolean/0) - A virtual field is used for intermediate or transient computation steps during the workflow and becomes a node in the workflow's graph, but does not get returned in the results of the workflow execution.

    In other words, virtual keys will not be included in the map returned by calling Pacer.Workflow.execute/1.

    The intent of a virtual field is to allow a spot for intermediate and/or transient calculation steps but to avoid the extra memory overhead that would be associated with carrying these values downstream if, for example, the map returned from Pacer.Workflow.execute/1 is stored in a long-lived process state; intermediate or transient values can cause unnecessary memory bloat if they are carried into process state where they are not neeeded.

    The default value is false.

@spec find_cycles(Graph.t()) :: nil

A Depth-First Search to find where is the dependency graph cycle and then display the cyclic dependencies back to the developer.

The graph/1 macro is the main entrypoint into Pacer.Workflow to create a dependency graph struct. use the Pacer.Workflow macro at the top of your module and proceed to define your fields and/or batches.

Example

  defmodule MyValidGraph do
    use Pacer.Workflow

    graph do
      field(:custom_field)
      field(:field_a, resolver: &__MODULE__.do_work/1, dependencies: [:custom_field])
      field(:field_with_default, default: "this is a default value")

      batch :http_requests do
        field(:request_1, resolver: &__MODULE__.do_work/1, dependencies: [:custom_field, :field_a])
        field(:request_2, resolver: &__MODULE__.do_work/1)
      end
    end

    def do_work(_), do: :ok
  end

Your module may only define ONE graph per module.

The above example will also create a struct with all of the fields defined within the graph, as follows:

%MyValidGraph{
  custom_field: nil,
  field_a: nil,
  field_with_default: "this is a default value",
  request_1: nil,
  request_2: nil
}

The graph macro gives you access to some defined metadata functions, such as (using the above example graph):

  • MyValidGraph.__graph__(:fields)
  • MyValidGraph.__graph__(:dependencies, :http_requests)
  • MyValidGraph.__graph__(:resolver, :field_a)

**Caution: These metadata functions are mostly intended for Pacer's internal use. Do not rely on their return values in runtime code as they may change as changes are made to the interface for Pacer.

Link to this function

validate_dependencies(module)

View Source
@spec validate_dependencies(module()) :: :ok | no_return()
Link to this function

validate_options(options, schema)

View Source
@spec validate_options(Keyword.t(), NimbleOptions.t()) :: Keyword.t()