PayDayLoan (pay_day_loan v0.7.0)

PayDayLoan

Build Status Coverage Status Hex.pm version Hex.pm downloads License API Docs

Fast cache now!

This project provides a framework for building on-demand caching in Elixir. It provides a synchronous API to a cache that is loaded asynchronously. The cache itself may be backed in any way that you choose, though the default is to use an ETS table backend that has several built-in features for managing the mapping of keys to process ids (e.g., a process registry). You have the option of implementing your own backend using Redis, mnesia, a single process, etc.

PDL is designed for low-latency access to cache elements after they are initially loaded and gives you a framework to minimize load time by performing batch loads. This works very well with data streaming applications that have multiple workers processing events in parallel and are sharing cache state across workers.

Think of PDL as a cache "frontend". In a typical application, we may want to load data from a database and cache it for fast lookup later. PDL provides a "frontend" so that MyCache.get(some_id) will automatically make sure that the data corresponding to some_id is loaded into the cache and will return the value once it is available (or time out if the load takes too long). It batches the loading of data so that you can take advantage of, e.g., database queries that fetch multiple records in one call.

The actual storage of the data is done by a cache "backend". PDL provides a default backend via PayDayLoan.EtsBackend that is quite flexible. You can, however, implement your own backend using the PayDayLoan.Backend behaviour. This is useful for using an external service (e.g., Redis) as a cache backend. See the examples below.

NOTE _pid functions (e.g., PayDayLoan.get_pid/2) are deprecated and have been removed. These functions can be replaced with their non-_pid equivalents. get_pid is replaced with get, peek_pid is replaced with peek, and with_pid is replaced with with_value. 0.3.0 was the last release that included the _pid functions.

key-ideas

Key ideas

  • Presents a synchronous API for asynchronous cache loading
  • The cache consists of key-value pairs
  • Provides a default backend for storing values in an ETS table but allows arbitrary backend implementations
  • Tries very hard not to use process messaging in the main lookup API because that can be a bottleneck. Uses ETS tables for state management.
  • Encourages bulk queries for cache loading.
  • Provides hooks for instrumentation

example-usage-default-backend

Example usage: Default backend

# cache wrapper module - this wraps the PDL functions so that they
#   make sense within the context of your application
defmodule MyCache do
  # defines MyCache.pay_day_loan/0 (and alias pdl/0),
  #    which is set up with defaults and the supplied callback module
  use PayDayLoan, callback_module: MyCacheLoader

  # optionally pass in other arguments to override defaults, e.g.,
  #   use PayDayLoan, callback_module: MyCacheLoader, batch_size: 100
  
  # also defines pass-through functions for the PayDayLoan module -
  #  e.g., `MyCache.get(key)` is a pass-through to
  #   `MyCache.get(MyCache.pdl(), key)`
end

# cache loader callback module - this will, for example, execute database
#   queries and turn the results into cache elements (e.g., Agent or
#   GenServer processes)
defmodule MyCacheLoader do
  @behaviour PayDayLoan.Loader
 
  def key_exists?(key) do
    # should return true if the key exists -
    #   e.g., if "SELECT count(1) FROM some_table WHERE id = #{key}" returns > 0
  end

  def bulk_load(keys) do
    # code to look up records for keys in database (or whatever)
    #  should return a list of tuples of the format
    #  [{key, load_datum}]
  end
  
  def new(key, load_datum) do
    # note these are three separate examples - your callback will not do
    #   all three

    # if we are using processes:
    Agent.start_link(fn -> load_datum end)

    # if we want to store a callback:
    {:ok, fn -> {:ok, load_datum} end}

    # if we want to store the bare value
    {:ok, load_datum}
  end
  
  def refresh(existing_value, key, load_datum) do
    # note these are three separate examples - your callback will not do
    #   all three

    # if we are using proccesses, the existing_value is the pid of the
    #   already-started process
    pid = existing_value
    Agent.update(pid, fn(_cached_datum) -> load_datum end)
    # we need to return the pid back
    {:ok, pid}

    # or we could stop the existing pid and replace it with a new one
    Agent.stop(pid)
    Agent.start_link(fn -> load_datum end)

    # or if we stored a callback
    {:ok, cached_datum} = existing_value.()
    Logger.info("Replacing #{inspect cached_datum} with #{inspect load_datum}")
    {:ok, fn -> {:ok, load_datum} end}

    # or to store the new datum as a bare value
    {:ok, load_datum}
  end
end

# Add PDL to your existing supervision tree so that everything initializes properly
defmodule MyOTPApp do
  use Application 

  # existing Application.start callback
  def start(_type, _args) do
    my_supervisor_children = [
      # ... existing children specs
      PayDayLoan.supervisor_specification(MyCache.pdl)
    ]
    
    # for example
    Supervisor.start_link(my_supervisor_children, supervisor_opts)
  end
end

# synchronous API - behind the scenes will add the key (1) to the
#   load state table and the asynchronous loader will include that
#   in its next load cycle - this call does not return until either
#   the cache is loaded (via new above) or the request times out
{:ok, value} = MyCache.get(1)

example-usage-process-backend-e-g-redis-connection

Example usage: Process backend (e.g., Redis connection)

# cache wrapper module - this wraps the PDL functions so that they
#   make sense within the context of your application
defmodule MyCache do
  # same as above but we specify a `backend` module and disable the
  #  cache monitor, we also specify a `backend_payload` so that we can
  #  specify a unique identifier for the backend process 
  use(
    PayDayLoan,
    callback_module: MyCacheLoader,
    backend: MyCacheBackend,
    backend_payload: :my_cache,
    cache_monitor: false # we won't be storing pids
  )
end

# same ideas as above but the new/refresh callbacks are different
defmodule MyCacheLoader do
  @behaviour PayDayLoan.Loader
 
  def key_exists?(key) do
    # should return true if the key exists -
    #   e.g., if "SELECT count(1) FROM some_table WHERE id = #{key}" returns > 0
  end

  def bulk_load(keys) do
    # code to look up records for keys in database (or whatever)
    #  should return a list of tuples of the format
    #  [{key, load_datum}]
  end
  
  def new(key, load_datum) do
    # we could modify the data here, but we are just going to store it raw
    {:ok, load_datum}
  end
  
  def refresh(_existing_value, key, load_datum) do
    # we could merge the existing value and the load_datum or we could modify
    #  before we store, but we're just going to replace
    {:ok, load_datum}
  end
end

# backend behaviour implementation
defmodule MyCacheBackend do
  @behaviour PayDayLoan.Backend

  # this shows an example of how we might use a single process backend, using
  # Redis is very similar - the process would be Redis connection and the
  # various callbacks would use Redis commands

  def start_link(name), do: Agent.start_link(fn -> %{} end, name: __name)

  # nothing to do for setup
  def setup(_pdl), do: :ok

  # this would be a little more involved with redis - you could use the KEYS
  #   command and then MGET but with a large cache, that approach is not
  #   advised.  SCAN can be used with larger caches.
  def reduce(pdl, acc0, reducer) do
    Agent.get(pdl.backend_payload, fn(m) -> Enum.reduce(m, acc0, reducer) end)
  end

  # with redis this could be a call to DBSIZE
  def size(pdl), do: Agent.get(pdl.backend_payload, &map_size/1) 

  # with redis this could be a call to the KEYS command
  def keys(pdl), do: Agent.get(pdl.backend_payload, &Map.keys/1)

  # see comments on the reduce command
  def values(pdl), do: Agent.get(pdl.backend_payload, &Map.values/1)

  # this should be a simple GET command in redis
  def get(pdl, key) do
    case Agent.get(pdl.backend_payload, fn(m) -> Map.get(m, key) end) do
      nil -> {:error, :not_found}
      v -> {:ok, v}
    end
  end

  # with redis you could use SET here
  def put(pdl, key, val) do
    Agent.update(pdl.backend_payload, fn(m) -> Map.put(m, key, "V#{val}") end)
  end

  # corresponds to redis DEL
  def delete(pdl, key) do
    Agent.update(pdl.backend_payload, fn(m) -> Map.delete(m, key) end)
  end
end

# Add PDL to your existing supervision tree so that everything initializes properly
defmodule MyOTPApp do
  use Application 

  # existing Application.start callback
  def start(_type, _args) do
    my_supervisor_children = [
      # start the backend with the payload as its name
      worker(MyCacheBackend, [MyCache.pdl().backend_payload]),
      # ... existing children specs
      PayDayLoan.supervisor_specification(MyCache.pdl)
    ]
    
    # for example
    Supervisor.start_link(my_supervisor_children, supervisor_opts)
  end
end

# synchronous API - behind the scenes will add the key (1) to the
#   load state table and the asynchronous loader will include that
#   in its next load cycle - this call does not return until either
#   the cache is loaded (via new above) or the request times out
{:ok, value} = MyCache.get(1)

logging-instrumentation

Logging & Instrumentation

The use macro accepts an event_loggers option, which should be a list of functions that take two arguments. When certain events occur, each of these functions will be called with an event atom and the key requested. The events are

  • :timed_out - Timed out while loading cache.
  • :disappeared - Key was marked as :loaded but the backend did not return a value
  • :failed - The loader failed to load a value for the key
  • :cache_miss - A requested value was not already cached
  • :no_key - The loaded says this key does not exist

Example usage:

defmodule CacheEventLogger do
  require Logger

  def log(event, key) do
    Logger.debug("Requesting key #{inspect key} caused event #{inspect event}")
  end
end

defmodule CacheEventStats do
  def log(event, key) do
    # update a statsd counter, etc.
  end
end

defmodule MyCache do
  use PayDayLoan, event_loggers: [&CacheEventLogge.log/2, &CacheEventStats.log/2]
end

The PayDayLoan.load_state_stats/1 function returns the count of keys in each load state and is also useful for instrumentation.

development-contributing

Development & Contributing

The usual Elixir and github contribution workflows apply. Pull requests are welcome!

mix deps.get
mix compile
mix test

license

License

See LICENSE.txt

Link to this section Summary

Types

Error values that may be returned from get/2

An event that can happen on cache request.

A function that takes an event and a key and performs some logging action. The return value is ignored

A key in the cache.

Datum returned by the load callback corresponding to a single key.

t()

Struct encapsulating a PDL cache.

Functions

Mixin support for generating a cache.

Manually add a single key/pid to the cache. Fails if the key is already in cache with a different pid.

Synchronously get the value for a key, attempting to load it if it is not alraedy loaded.

Returns a list of all keys in the given cache

Returns a map of load states and the number of keys in each state

Check for a cached value, but do not request a load

Check load state, but do not request a load

Returns a list of all pids in the given cache

Check load state, request load if not loaded or loading

Perform Enum.reduce/3 over all {key, pid} pairs in the given cache

Request a load of one or more keys.

Returns the number of keys in the given cache

Returns a supervisor specification for the given pdl

Remove a key without killing the underlying process.

Return all of the values stored in the backend

Link to this section Types

@type error() :: :not_found | :timed_out | :failed

Error values that may be returned from get/2

  • :not_found - The key is not found as per the key_exists? loader callback
  • :timed_out - Timed out waiting for the value to load.
  • :failed - Either the new or refresh callback failed or returned :ignore.

Note - failure state clears when the get function returns. Further calls to get will retry a load.

@type event() :: :timed_out | :disappeared | :failed | :cache_miss | :no_key

An event that can happen on cache request.

  • :timed_out - Timed out while loading cache.
  • :disappeared - Key was marked as :loaded but the backend did not return a value
  • :failed - The loader failed to load a value for the key
  • :cache_miss - A requested value was not already cached
  • :no_key - The loaded says this key does not exist
Link to this type

event_logger()

@type event_logger() :: (event(), key() -> term())

A function that takes an event and a key and performs some logging action. The return value is ignored

@type key() :: term()

A key in the cache.

This could be any Erlang/Elixir term. In practice, for example, it may be an integer representing the primary key in a database table.

Link to this type

load_datum()

@type load_datum() :: term()

Datum returned by the load callback corresponding to a single key.

For example, this could be a tuple of database column values or a struct encapsulating such values. Your new and refresh loader callbacks should know how to ingest these values to generate new cache entry processes.

@type t() :: %PayDayLoan{
  backend: atom(),
  backend_payload: atom(),
  batch_size: pos_integer(),
  cache_monitor: atom() | false,
  callback_module: module(),
  event_loggers: [event_logger()],
  key_cache: atom(),
  load_num_tries: pos_integer(),
  load_state_manager: atom(),
  load_task_supervisor: atom(),
  load_wait_msec: pos_integer(),
  load_worker: atom(),
  supervisor_name: atom()
}

Struct encapsulating a PDL cache.

  • backend - Implementation of the Backend behaviour - defaults to PayDayLoan.EtsBackend.
  • backend_payload - Arbitrary payload for the backend - defaults to the ETS table id for the ETS backend.
  • load_state_manager - ETS table id for load state table.
  • cache_monitor - Registration name for the monitor process, or false if no monitor should be started.
  • key_cache - ETS table id for key cache table.
  • load_worker - Registration name for the load worker GenServer.
  • callback_module - Module implementing the PayDayLoan.Loader behaviour.
  • batch_size - Maximum number of keys to load at once. Default 1000
  • load_num_tries - Maximum number of times to wait for cache load. Default 10
  • load_wait_msec - Amount of time to wait between checking load state. Default 500
  • supervisor_name - Registration name for the supervisor.

Link to this section Functions

Link to this macro

__using__(opts)

(macro)

Mixin support for generating a cache.

Example:

defmodule MyCache do
  use PayDayLoan, callback_module: MyCacheLoader
end

The above would define MyCache.pay_day_loan/0, which returns a PDL struct that is configured for this cache and has callback module MyCacheLoader. Other keys of the %PayDayLoan{} struct can be passed in as options to override the defaults.

Also defines pass-through convenience functions for every function in PayDayLoan.

Link to this function

cache(pdl, key, value)

@spec cache(t(), key(), pid()) :: :ok | {:error, pid()}

Manually add a single key/pid to the cache. Fails if the key is already in cache with a different pid.

@spec get(pdl :: t(), key()) :: {:ok, term()} | {:error, error()}

Synchronously get the value for a key, attempting to load it if it is not alraedy loaded.

@spec keys(pdl :: t()) :: [key()]

Returns a list of all keys in the given cache

Link to this function

load_state_stats(pdl)

@spec load_state_stats(pdl :: t()) :: %{}

Returns a map of load states and the number of keys in each state

Useful for instrumentation

@spec peek(t(), key()) :: {:ok, term()} | {:error, :not_found}

Check for a cached value, but do not request a load

Link to this function

peek_load_state(pdl, key)

@spec peek_load_state(pdl :: t(), key()) :: nil | PayDayLoan.LoadState.t()

Check load state, but do not request a load

@spec pids(pdl :: t()) :: [pid()]

Returns a list of all pids in the given cache

Link to this function

query_load_state(pdl, key)

@spec query_load_state(pdl :: t(), key()) :: PayDayLoan.LoadState.t()

Check load state, request load if not loaded or loading

Does not ping the load worker. A load will not happen until the next ping. Use request_load/2 to request load and trigger a load ping.

Link to this function

reduce(pdl, acc0, reducer)

@spec reduce(
  pdl :: t(),
  acc0 :: term(),
  reducer :: ({key(), pid()}, term() -> term())
) :: term()

Perform Enum.reduce/3 over all {key, pid} pairs in the given cache

Link to this function

request_load(pdl, key_or_keys)

@spec request_load(pdl :: t(), key() | [key()]) :: :ok

Request a load of one or more keys.

Load is asynchronous - this function returns immediately

@spec size(pdl :: t()) :: non_neg_integer()

Returns the number of keys in the given cache

Link to this function

supervisor_specification(pdl)

@spec supervisor_specification(pdl :: t()) :: Supervisor.child_spec()

Returns a supervisor specification for the given pdl

Link to this function

uncache_key(pdl, key)

@spec uncache_key(t(), key()) :: :ok

Remove a key without killing the underlying process.

If you want to remove an element from cache, just kill the underlying process.

@spec values(pdl :: t()) :: [term()]

Return all of the values stored in the backend

Link to this function

with_value(pdl, key, found_callback, not_found_callback \\ fn -> {:error, :not_found} end)

@spec with_value(t(), key(), (term() -> term()), (() -> term())) :: term()

Execute a callback with a value if it is found.

If no value is found, not_found_callback is executed. By default, the not_found_callback is a function that returns {:error, :not_found}.