pay_day_loan v0.3.0 PayDayLoan

PayDayLoan

Build Status Coverage Status Hex.pm version Hex.pm downloads License API Docs

Fast cache now!

This project provides a framework for building on-demand caching in Elixir. It provides a synchronous API to a cache that is loaded asynchronously. The cache itself may be backed in any way that you choose, though the default is to use an ETS table backend that has several built-in features for managing the mapping of keys to process ids (e.g., a process registry). You have the option of implementing your own backend using Redis, mnesia, a single process, etc.

PDL is designed for low-latency access to cache elements after they are initially loaded and gives you a framework to minimize load time by performing batch loads. This works very well with data streaming applications that have multiple workers processing events in parallel and are sharing cache state across workers.

Think of PDL as a cache “frontend”. In a typical application, we may want to load data from a database and cache it for fast lookup later. PDL provides a “frontend” so that MyCache.get(some_id) will automatically make sure that the data corresponding to some_id is loaded into the cache and will return the value once it is available (or time out if the load takes too long). It batches the loading of data so that you can take advantage of, e.g., database queries that fetch multiple records in one call.

The actual storage of the data is done by a cache “backend”. PDL provides a default backend via PayDayLoan.EtsBackend that is quite flexible. You can, however, implement your own backend using the PayDayLoan.Backend behaviour. This is useful for using an external service (e.g., Redis) as a cache backend. See the examples below.

NOTE As of 0.3.0, any _pid functions (e.g., PayDayLoan.get_pid/2) will emit a warning message. These functions are deprecated and will be removed in a future release. get_pid is replaced with get, peek_pid is replaced with peek, and with_pid is replaced with with_value.

Key ideas

  • Presents a synchronous API for asynchronous cache loading
  • The cache consists of key-value pairs
  • Provides a default backend for storing values in an ETS table but allows arbitrary backend implementations
  • Tries very hard not to use process messaging in the main lookup API because that can be a bottleneck. Uses ETS tables for state management.
  • Encourages bulk queries for cache loading.

Example usage: Default backend

# cache wrapper module - this wraps the PDL functions so that they
#   make sense within the context of your application
defmodule MyCache do
  # defines MyCache.pay_day_loan/0 (and alias pdl/0),
  #    which is set up with defaults and the supplied callback module
  use PayDayLoan, callback_module: MyCacheLoader

  # optionally pass in other arguments to override defaults, e.g.,
  #   use PayDayLoan, callback_module: MyCacheLoader, batch_size: 100
  
  # also defines pass-through functions for the PayDayLoan module -
  #  e.g., `MyCache.get(key)` is a pass-through to
  #   `MyCache.get(MyCache.pdl(), key)`
end

# cache loader callback module - this will, for example, execute database
#   queries and turn the results into cache elements (e.g., Agent or
#   GenServer processes)
defmodule MyCacheLoader do
  @behaviour PayDayLoan.Loader
 
  def key_exists?(key) do
    # should return true if the key exists -
    #   e.g., if "SELECT count(1) FROM some_table WHERE id = #{key}" returns > 0
  end

  def bulk_load(keys) do
    # code to look up records for keys in database (or whatever)
    #  should return a list of tuples of the format
    #  [{key, load_datum}]
  end
  
  def new(key, load_datum) do
    # note these are three separate examples - your callback will not do
    #   all three

    # if we are using processes:
    Agent.start_link(fn -> load_datum end)

    # if we want to store a callback:
    {:ok, fn -> {:ok, load_datum} end}

    # if we want to store the bare value
    {:ok, load_datum}
  end
  
  def refresh(existing_value, key, load_datum) do
    # note these are three separate examples - your callback will not do
    #   all three

    # if we are using proccesses, the existing_value is the pid of the
    #   already-started process
    pid = existing_value
    Agent.update(pid, fn(_cached_datum) -> load_datum end)
    # we need to return the pid back
    {:ok, pid}

    # or we could stop the existing pid and replace it with a new one
    Agent.stop(pid)
    Agent.start_link(fn -> load_datum end)

    # or if we stored a callback
    {:ok, cached_datum} = existing_value.()
    Logger.info("Replacing #{inspect cached_datum} with #{inspect load_datum}")
    {:ok, fn -> {:ok, load_datum} end}

    # or to store the new datum as a bare value
    {:ok, load_datum}
  end
end

# Add PDL to your existing supervision tree so that everything initializes properly
defmodule MyOTPApp do
  use Application 

  # existing Application.start callback
  def start(_type, _args) do
    my_supervisor_children = [
      # ... existing children specs
      PayDayLoan.supervisor_specification(MyCache.pdl)
    ]
    
    # for example
    Supervisor.start_link(my_supervisor_children, supervisor_opts)
  end
end

# synchronous API - behind the scenes will add the key (1) to the
#   load state table and the asynchronous loader will include that
#   in its next load cycle - this call does not return until either
#   the cache is loaded (via new above) or the request times out
{:ok, value} = MyCache.get(1)

Example usage: Process backend (e.g., Redis connection)

# cache wrapper module - this wraps the PDL functions so that they
#   make sense within the context of your application
defmodule MyCache do
  # same as above but we specify a `backend` module and disable the
  #  cache monitor, we also specify a `backend_payload` so that we can
  #  specify a unique identifier for the backend process 
  use(
    PayDayLoan,
    callback_module: MyCacheLoader,
    backend: MyCacheBackend,
    backend_payload: :my_cache,
    cache_monitor: false # we won't be storing pids
  )
end

# same ideas as above but the new/refresh callbacks are different
defmodule MyCacheLoader do
  @behaviour PayDayLoan.Loader
 
  def key_exists?(key) do
    # should return true if the key exists -
    #   e.g., if "SELECT count(1) FROM some_table WHERE id = #{key}" returns > 0
  end

  def bulk_load(keys) do
    # code to look up records for keys in database (or whatever)
    #  should return a list of tuples of the format
    #  [{key, load_datum}]
  end
  
  def new(key, load_datum) do
    # we could modify the data here, but we are just going to store it raw
    {:ok, load_datum}
  end
  
  def refresh(_existing_value, key, load_datum) do
    # we could merge the existing value and the load_datum or we could modify
    #  before we store, but we're just going to replace
    {:ok, load_datum}
  end
end

# backend behaviour implementation
defmodule MyCacheBackend do
  @behaviour PayDayLoan.Backend

  # this shows an example of how we might use a single process backend, using
  # Redis is very similar - the process would be Redis connection and the
  # various callbacks would use Redis commands

  def start_link(name), do: Agent.start_link(fn -> %{} end, name: __name)

  # nothing to do for setup
  def setup(_pdl), do: :ok

  # this would be a little more involved with redis - you could use the KEYS
  #   command and then MGET but with a large cache, that approach is not
  #   advised.  SCAN can be used with larger caches.
  def reduce(pdl, acc0, reducer) do
    Agent.get(pdl.backend_payload, fn(m) -> Enum.reduce(m, acc0, reducer) end)
  end

  # with redis this could be a call to DBSIZE
  def size(pdl), do: Agent.get(pdl.backend_payload, &Map.size/1) 

  # with redis this could be a call to the KEYS command
  def keys(pdl), do: Agent.get(pdl.backend_payload, &Map.keys/1)

  # see comments on the reduce command
  def values(pdl), do: Agent.get(pdl.backend_payload, &Map.values/1)

  # this should be a simple GET command in redis
  def get(pdl, key) do
    case Agent.get(pdl.backend_payload, fn(m) -> Map.get(m, key) end) do
      nil -> {:error, :not_found}
      v -> {:ok, v}
    end
  end

  # with redis you could use SET here
  def put(pdl, key, val) do
    Agent.update(pdl.backend_payload, fn(m) -> Map.put(m, key, "V#{val}") end)
  end

  # corresponds to redis DEL
  def delete(pdl, key) do
    Agent.update(pdl.backend_payload, fn(m) -> Map.delete(m, key) end)
  end
end

# Add PDL to your existing supervision tree so that everything initializes properly
defmodule MyOTPApp do
  use Application 

  # existing Application.start callback
  def start(_type, _args) do
    my_supervisor_children = [
      # start the backend with the payload as its name
      worker(MyCacheBackend, [MyCache.pdl().backend_payload]),
      # ... existing children specs
      PayDayLoan.supervisor_specification(MyCache.pdl)
    ]
    
    # for example
    Supervisor.start_link(my_supervisor_children, supervisor_opts)
  end
end

# synchronous API - behind the scenes will add the key (1) to the
#   load state table and the asynchronous loader will include that
#   in its next load cycle - this call does not return until either
#   the cache is loaded (via new above) or the request times out
{:ok, value} = MyCache.get(1)

Development & Contributing

The usual Elixir and github contribution workflows apply. Pull requests are welcome!

mix deps.get
mix compile
mix test

License

See LICENSE.txt

Summary

Types

Error values that may be returned from get/2

A key in the cache

Datum returned by the load callback corresponding to a single key

t()

Struct encapsulating a PDL cache

Functions

Mixin support for generating a cache

Manually add a single key/pid to the cache. Fails if the key is already in cache with a different pid

Synchronously get the value for a key, attempting to load it if it is not alraedy loaded

Synchronously get the pid for a key, attempting to load it if it is not already loaded

Returns a list of all keys in the given cache

Check for a cached value, but do not request a load

Check load state, but do not request a load

Check for cached pid, but do not request a load

Returns a list of all pids in the given cache

Check load state, request load if not loaded or loading

Perform Enum.reduce/3 over all {key, pid} pairs in the given cache

Request a load of one or more keys

Returns the number of keys in the given cache

Returns a supervisor specification for the given pdl

Remove a key without killing the underlying process

Return all of the values stored in the backend

Types

error()
error() :: :not_found | :timed_out | :failed

Error values that may be returned from get/2

  • :not_found - The key is not found as per the key_exists? loader callback
  • :timed_out - Timed out waiting for the value to load.
  • :failed - Either the new or refresh callback failed or returned :ignore.

Note - failure state clears when the get function returns. Further calls to get will retry a load.

key()
key() :: term

A key in the cache.

This could be any Erlang/Elixir term. In practice, for example, it may be an integer representing the primary key in a database table.

load_datum()
load_datum() :: term

Datum returned by the load callback corresponding to a single key.

For example, this could be a tuple of database column values or a struct encapsulating such values. Your new and refresh loader callbacks should know how to ingest these values to generate new cache entry processes.

t()
t() :: %PayDayLoan{backend: atom, backend_payload: atom, batch_size: pos_integer, cache_monitor: atom | false, callback_module: module, key_cache: atom, load_num_tries: pos_integer, load_state_manager: atom, load_wait_msec: pos_integer, load_worker: atom, supervisor_name: atom}

Struct encapsulating a PDL cache.

  • backend - Implementation of the Backend behaviour - defaults to PayDayLoan.EtsBackend.
  • backend_payload - Arbitrary payload for the backend - defaults to the ETS table id for the ETS backend.
  • load_state_manager - ETS table id for load state table.
  • cache_monitor - Registration name for the monitor process, or false if no monitor should be started.
  • key_cache - ETS table id for key cache table.
  • load_worker - Registration name for the load worker GenServer.
  • callback_module - Module implementing the PayDayLoan.Loader behaviour.
  • batch_size - Maximum number of keys to load at once. Default 1000
  • load_num_tries - Maximum number of times to wait for cache load. Default 10
  • load_wait_msec - Amount of time to wait between checking load state. Default 500
  • supervisor_name - Registration name for the supervisor.

Functions

__using__(opts) (macro)

Mixin support for generating a cache.

Example:

defmodule MyCache do
  use PayDayLoan, callback_module: MyCacheLoader
end

The above would define MyCache.pay_day_loan/0, which returns a PDL struct that is configured for this cache and has callback module MyCacheLoader. Other keys of the %PayDayLoan{} struct can be passed in as options to override the defaults.

Also defines pass-through convenience functions for every function in PayDayLoan.

cache(pay_day_loan, key, value)
cache(t, key, pid) :: :ok | {:error, pid}

Manually add a single key/pid to the cache. Fails if the key is already in cache with a different pid.

get(pay_day_loan, key)
get(pdl :: t, key) :: {:ok, term} | {:error, error}

Synchronously get the value for a key, attempting to load it if it is not alraedy loaded.

get_pid(pay_day_loan, key)
get_pid(pdl :: t, key) :: {:ok, pid} | {:error, error}

Synchronously get the pid for a key, attempting to load it if it is not already loaded.

This is a legacy API method and may be deprecated. Use get/2.

keys(pay_day_loan)
keys(pdl :: t) :: [PayDayLoan.key]

Returns a list of all keys in the given cache

peek(pay_day_loan, key)
peek(t, key) :: {:ok, term} | {:error, :not_found}

Check for a cached value, but do not request a load

peek_load_state(pay_day_loan, key)
peek_load_state(pdl :: t, key) :: nil | PayDayLoan.LoadState.t

Check load state, but do not request a load

peek_pid(pay_day_loan, key)
peek_pid(pdl :: t, key :: PayDayLoan.key) ::
  {:ok, pid} |
  {:error, :not_found}

Check for cached pid, but do not request a load

This is a legacy API method and may be deprecated. Use peek/2.

pids(pay_day_loan)
pids(pdl :: t) :: [pid]

Returns a list of all pids in the given cache

query_load_state(pay_day_loan, key)
query_load_state(pdl :: t, key) :: PayDayLoan.LoadState.t

Check load state, request load if not loaded or loading

Does not ping the load worker. A load will not happen until the next ping. Use request_load/2 to request load and trigger a load ping.

reduce(pay_day_loan, acc0, reducer)
reduce(pdl :: t, acc0 :: term, reducer :: ({key, pid}, term -> term)) :: term

Perform Enum.reduce/3 over all {key, pid} pairs in the given cache

request_load(pay_day_loan, key_or_keys)
request_load(pdl :: t, key | [key]) :: :ok

Request a load of one or more keys.

Load is asynchronous - this function returns immediately

size(pay_day_loan)
size(pdl :: t) :: non_neg_integer

Returns the number of keys in the given cache

supervisor_specification(pay_day_loan)
supervisor_specification(pdl :: PayDayLoan.t) :: Supervisor.Spec.spec

Returns a supervisor specification for the given pdl

uncache_key(pay_day_loan, key)
uncache_key(t, key) :: :ok

Remove a key without killing the underlying process.

If you want to remove an element from cache, just kill the underlying process.

values(pay_day_loan)
values(pdl :: t) :: [term]

Return all of the values stored in the backend

with_pid(pay_day_loan, key, found_callback, not_found_callback \\ fn -> {:error, :not_found} end)
with_pid(t, PayDayLoan.key, (pid -> term), (() -> term)) :: term

Execute a callback with a pid if it is found.

If no pid is found, not_found_callback is executed. By default, not_found_callback returns {:error, :not_found}.

This is a legacy API method and may be deprecated. Use with_value/4.

with_value(pdl, key, found_callback, not_found_callback \\ fn -> {:error, :not_found} end)
with_value(t, PayDayLoan.key, (term -> term), (() -> term)) :: term

Execute a callback with a value if it is found.

If no value is found, not_found_callback is executed. By default, the not_found_callback is a function that returns {:error, :not_found}.