pay_day_loan v0.5.1 PayDayLoan
PayDayLoan
Fast cache now!
This project provides a framework for building on-demand caching in Elixir. It provides a synchronous API to a cache that is loaded asynchronously. The cache itself may be backed in any way that you choose, though the default is to use an ETS table backend that has several built-in features for managing the mapping of keys to process ids (e.g., a process registry). You have the option of implementing your own backend using Redis, mnesia, a single process, etc.
PDL is designed for low-latency access to cache elements after they are initially loaded and gives you a framework to minimize load time by performing batch loads. This works very well with data streaming applications that have multiple workers processing events in parallel and are sharing cache state across workers.
Think of PDL as a cache “frontend”. In a typical application, we may want to
load data from a database and cache it for fast lookup later. PDL provides
a “frontend” so that MyCache.get(some_id)
will automatically make sure that
the data corresponding to some_id
is loaded into the cache and will return
the value once it is available (or time out if the load takes too long). It
batches the loading of data so that you can take advantage of, e.g., database
queries that fetch multiple records in one call.
The actual storage of the data is done by a cache “backend”. PDL provides a
default backend via PayDayLoan.EtsBackend
that is quite flexible. You can,
however, implement your own backend using the PayDayLoan.Backend
behaviour.
This is useful for using an external service (e.g., Redis) as a cache backend.
See the examples below.
NOTE _pid
functions (e.g., PayDayLoan.get_pid/2
) are deprecated and
have been removed. These functions can be replaced with their non-_pid
equivalents. get_pid
is replaced with get
, peek_pid
is replaced
with peek
, and with_pid
is replaced with with_value
. 0.3.0 was the last
release that included the _pid
functions.
Key ideas
- Presents a synchronous API for asynchronous cache loading
- The cache consists of key-value pairs
- Provides a default backend for storing values in an ETS table but allows arbitrary backend implementations
- Tries very hard not to use process messaging in the main lookup API because that can be a bottleneck. Uses ETS tables for state management.
- Encourages bulk queries for cache loading.
- Provides hooks for instrumentation
Example usage: Default backend
# cache wrapper module - this wraps the PDL functions so that they
# make sense within the context of your application
defmodule MyCache do
# defines MyCache.pay_day_loan/0 (and alias pdl/0),
# which is set up with defaults and the supplied callback module
use PayDayLoan, callback_module: MyCacheLoader
# optionally pass in other arguments to override defaults, e.g.,
# use PayDayLoan, callback_module: MyCacheLoader, batch_size: 100
# also defines pass-through functions for the PayDayLoan module -
# e.g., `MyCache.get(key)` is a pass-through to
# `MyCache.get(MyCache.pdl(), key)`
end
# cache loader callback module - this will, for example, execute database
# queries and turn the results into cache elements (e.g., Agent or
# GenServer processes)
defmodule MyCacheLoader do
@behaviour PayDayLoan.Loader
def key_exists?(key) do
# should return true if the key exists -
# e.g., if "SELECT count(1) FROM some_table WHERE id = #{key}" returns > 0
end
def bulk_load(keys) do
# code to look up records for keys in database (or whatever)
# should return a list of tuples of the format
# [{key, load_datum}]
end
def new(key, load_datum) do
# note these are three separate examples - your callback will not do
# all three
# if we are using processes:
Agent.start_link(fn -> load_datum end)
# if we want to store a callback:
{:ok, fn -> {:ok, load_datum} end}
# if we want to store the bare value
{:ok, load_datum}
end
def refresh(existing_value, key, load_datum) do
# note these are three separate examples - your callback will not do
# all three
# if we are using proccesses, the existing_value is the pid of the
# already-started process
pid = existing_value
Agent.update(pid, fn(_cached_datum) -> load_datum end)
# we need to return the pid back
{:ok, pid}
# or we could stop the existing pid and replace it with a new one
Agent.stop(pid)
Agent.start_link(fn -> load_datum end)
# or if we stored a callback
{:ok, cached_datum} = existing_value.()
Logger.info("Replacing #{inspect cached_datum} with #{inspect load_datum}")
{:ok, fn -> {:ok, load_datum} end}
# or to store the new datum as a bare value
{:ok, load_datum}
end
end
# Add PDL to your existing supervision tree so that everything initializes properly
defmodule MyOTPApp do
use Application
# existing Application.start callback
def start(_type, _args) do
my_supervisor_children = [
# ... existing children specs
PayDayLoan.supervisor_specification(MyCache.pdl)
]
# for example
Supervisor.start_link(my_supervisor_children, supervisor_opts)
end
end
# synchronous API - behind the scenes will add the key (1) to the
# load state table and the asynchronous loader will include that
# in its next load cycle - this call does not return until either
# the cache is loaded (via new above) or the request times out
{:ok, value} = MyCache.get(1)
Example usage: Process backend (e.g., Redis connection)
# cache wrapper module - this wraps the PDL functions so that they
# make sense within the context of your application
defmodule MyCache do
# same as above but we specify a `backend` module and disable the
# cache monitor, we also specify a `backend_payload` so that we can
# specify a unique identifier for the backend process
use(
PayDayLoan,
callback_module: MyCacheLoader,
backend: MyCacheBackend,
backend_payload: :my_cache,
cache_monitor: false # we won't be storing pids
)
end
# same ideas as above but the new/refresh callbacks are different
defmodule MyCacheLoader do
@behaviour PayDayLoan.Loader
def key_exists?(key) do
# should return true if the key exists -
# e.g., if "SELECT count(1) FROM some_table WHERE id = #{key}" returns > 0
end
def bulk_load(keys) do
# code to look up records for keys in database (or whatever)
# should return a list of tuples of the format
# [{key, load_datum}]
end
def new(key, load_datum) do
# we could modify the data here, but we are just going to store it raw
{:ok, load_datum}
end
def refresh(_existing_value, key, load_datum) do
# we could merge the existing value and the load_datum or we could modify
# before we store, but we're just going to replace
{:ok, load_datum}
end
end
# backend behaviour implementation
defmodule MyCacheBackend do
@behaviour PayDayLoan.Backend
# this shows an example of how we might use a single process backend, using
# Redis is very similar - the process would be Redis connection and the
# various callbacks would use Redis commands
def start_link(name), do: Agent.start_link(fn -> %{} end, name: __name)
# nothing to do for setup
def setup(_pdl), do: :ok
# this would be a little more involved with redis - you could use the KEYS
# command and then MGET but with a large cache, that approach is not
# advised. SCAN can be used with larger caches.
def reduce(pdl, acc0, reducer) do
Agent.get(pdl.backend_payload, fn(m) -> Enum.reduce(m, acc0, reducer) end)
end
# with redis this could be a call to DBSIZE
def size(pdl), do: Agent.get(pdl.backend_payload, &Map.size/1)
# with redis this could be a call to the KEYS command
def keys(pdl), do: Agent.get(pdl.backend_payload, &Map.keys/1)
# see comments on the reduce command
def values(pdl), do: Agent.get(pdl.backend_payload, &Map.values/1)
# this should be a simple GET command in redis
def get(pdl, key) do
case Agent.get(pdl.backend_payload, fn(m) -> Map.get(m, key) end) do
nil -> {:error, :not_found}
v -> {:ok, v}
end
end
# with redis you could use SET here
def put(pdl, key, val) do
Agent.update(pdl.backend_payload, fn(m) -> Map.put(m, key, "V#{val}") end)
end
# corresponds to redis DEL
def delete(pdl, key) do
Agent.update(pdl.backend_payload, fn(m) -> Map.delete(m, key) end)
end
end
# Add PDL to your existing supervision tree so that everything initializes properly
defmodule MyOTPApp do
use Application
# existing Application.start callback
def start(_type, _args) do
my_supervisor_children = [
# start the backend with the payload as its name
worker(MyCacheBackend, [MyCache.pdl().backend_payload]),
# ... existing children specs
PayDayLoan.supervisor_specification(MyCache.pdl)
]
# for example
Supervisor.start_link(my_supervisor_children, supervisor_opts)
end
end
# synchronous API - behind the scenes will add the key (1) to the
# load state table and the asynchronous loader will include that
# in its next load cycle - this call does not return until either
# the cache is loaded (via new above) or the request times out
{:ok, value} = MyCache.get(1)
Logging & Instrumentation
The use
macro accepts an event_loggers
option, which should be a list of
functions that take two arguments. When certain events occur, each of these
functions will be called with an event atom and the key requested. The events
are
:timed_out
- Timed out while loading cache.:disappeared
- Key was marked as:loaded
but the backend did not return a value:failed
- The loader failed to load a value for the key:cache_miss
- A requested value was not already cached:no_key
- The loaded says this key does not exist
Example usage:
defmodule CacheEventLogger do
require Logger
def log(event, key) do
Logger.debug("Requesting key #{inspect key} caused event #{inspect event}")
end
end
defmodule CacheEventStats do
def log(event, key) do
# update a statsd counter, etc.
end
end
defmodule MyCache do
use PayDayLoan, event_loggers: [&CacheEventLogge.log/2, &CacheEventStats.log/2]
end
The PayDayLoan.load_state_stats/1
function returns the count of keys in each
load state and is also useful for instrumentation.
Development & Contributing
The usual Elixir and github contribution workflows apply. Pull requests are welcome!
mix deps.get
mix compile
mix test
License
See LICENSE.txt
Summary
Types
Error values that may be returned from get/2
An event that can happen on cache request
A function that takes an event
and a key
and performs some logging action.
The return value is ignored
A key in the cache
Datum returned by the load callback corresponding to a single key
Struct encapsulating a PDL cache
Functions
Mixin support for generating a cache
Manually add a single key/pid to the cache. Fails if the key is already in cache with a different pid
Synchronously get the value for a key, attempting to load it if it is not alraedy loaded
Returns a list of all keys in the given cache
Returns a map of load states and the number of keys in each state
Check for a cached value, but do not request a load
Check load state, but do not request a load
Returns a list of all pids in the given cache
Check load state, request load if not loaded or loading
Perform Enum.reduce/3 over all {key, pid} pairs in the given cache
Request a load of one or more keys
Returns the number of keys in the given cache
Returns a supervisor specification for the given pdl
Remove a key without killing the underlying process
Return all of the values stored in the backend
Execute a callback with a value if it is found
Types
Error values that may be returned from get/2
:not_found
- The key is not found as per the key_exists? loader callback:timed_out
- Timed out waiting for the value to load.:failed
- Either the new or refresh callback failed or returned:ignore
.
Note - failure state clears when the get function returns. Further calls to get will retry a load.
An event that can happen on cache request.
:timed_out
- Timed out while loading cache.:disappeared
- Key was marked as:loaded
but the backend did not return a value:failed
- The loader failed to load a value for the key:cache_miss
- A requested value was not already cached:no_key
- The loaded says this key does not exist
A function that takes an event
and a key
and performs some logging action.
The return value is ignored
A key in the cache.
This could be any Erlang/Elixir term
. In practice, for example, it may be
an integer representing the primary key in a database table.
Datum returned by the load callback corresponding to a single key.
For example, this could be a tuple of database column values or a struct encapsulating such values. Your new and refresh loader callbacks should know how to ingest these values to generate new cache entry processes.
t() :: %PayDayLoan{backend: atom, backend_payload: atom, batch_size: pos_integer, cache_monitor: atom | false, callback_module: module, event_loggers: [event_logger], key_cache: atom, load_num_tries: pos_integer, load_state_manager: atom, load_wait_msec: pos_integer, load_worker: atom, supervisor_name: atom}
Struct encapsulating a PDL cache.
backend
- Implementation of the Backend behaviour - defaults to PayDayLoan.EtsBackend.backend_payload
- Arbitrary payload for the backend - defaults to the ETS table id for the ETS backend.load_state_manager
- ETS table id for load state table.cache_monitor
- Registration name for the monitor process, or false if no monitor should be started.key_cache
- ETS table id for key cache table.load_worker
- Registration name for the load worker GenServer.callback_module
- Module implementing the PayDayLoan.Loader behaviour.batch_size
- Maximum number of keys to load at once. Default 1000load_num_tries
- Maximum number of times to wait for cache load. Default 10load_wait_msec
- Amount of time to wait between checking load state. Default 500supervisor_name
- Registration name for the supervisor.
Functions
Mixin support for generating a cache.
Example:
defmodule MyCache do
use PayDayLoan, callback_module: MyCacheLoader
end
The above would define MyCache.pay_day_loan/0
, which returns a PDL struct
that is configured for this cache and has callback module MyCacheLoader
.
Other keys of the %PayDayLoan{}
struct can be passed in as options to
override the defaults.
Also defines pass-through convenience functions for every function in
PayDayLoan
.
Manually add a single key/pid to the cache. Fails if the key is already in cache with a different pid.
Synchronously get the value for a key, attempting to load it if it is not alraedy loaded.
Returns a list of all keys in the given cache
Returns a map of load states and the number of keys in each state
Useful for instrumentation
Check for a cached value, but do not request a load
Check load state, but do not request a load
Returns a list of all pids in the given cache
Check load state, request load if not loaded or loading
Does not ping the load worker. A load will not happen until
the next ping. Use request_load/2
to request load and trigger a load ping.
Perform Enum.reduce/3 over all {key, pid} pairs in the given cache
Request a load of one or more keys.
Load is asynchronous - this function returns immediately
Returns the number of keys in the given cache
supervisor_specification(pdl :: PayDayLoan.t) :: Supervisor.Spec.spec
Returns a supervisor specification for the given pdl
Remove a key without killing the underlying process.
If you want to remove an element from cache, just kill the underlying process.
Return all of the values stored in the backend
with_value(t, PayDayLoan.key, (term -> term), (() -> term)) :: term
Execute a callback with a value if it is found.
If no value is found, not_found_callback
is executed. By default,
the not_found_callback
is a function that returns {:error, :not_found}
.