Nebulex.Adapters.Local (nebulex_local v3.0.0-rc.1)
View SourceA Local Generation Cache adapter for Nebulex; inspired by epocxy cache.
Generational caching using an ETS table (or multiple ones when used with
:shards
) for each generation of cached data. Accesses hit the newer
generation first, and migrate from the older generation to the newer
generation when retrieved from the stale table. When a new generation
is started, the oldest one is deleted. This is a form of mass garbage
collection which avoids using timers and expiration of individual
cached elements.
This implementation of generation cache uses only two generations, referred
to as the new
and the old
generation.
See Nebulex.Adapters.Local.Generation
to learn more about generation
management and garbage collection.
Overall features
- Configurable backend (
ets
or:shards
). - Expiration - A status based on TTL (Time To Live) option. To maintain cache performance, expired entries may not be immediately removed or evicted, they are expired or evicted on-demand, when the key is read.
- Eviction - Generational Garbage Collection.
- Sharding - For intensive workloads, the Cache may also be partitioned
(by using
:shards
backend and specifying the:partitions
option). - Support for transactions via Erlang global name registration facility.
See
Nebulex.Adapter.Transaction
. - Support for stats.
Configuration options
The following options can be used to configure the adapter:
:cache
(atom/0
) - Required. The defined cache module.:stats
(boolean/0
) - A flag to determine whether to collect cache stats. The default value istrue
.:backend
(backend/0
) - The backend or storage to be used for the adapter. The default value is:ets
.:read_concurrency
(boolean/0
) - Since the adapter uses ETS tables internally, this option is when creating a new table or generation. See:ets.new/2
options. The default value istrue
.:write_concurrency
(boolean/0
) - Since the adapter uses ETS tables internally, this option is when creating a new table or generation. See:ets.new/2
options. The default value istrue
.:compressed
(boolean/0
) - Since the adapter uses ETS tables internally, this option is when creating a new table or generation. See:ets.new/2
options. The default value isfalse
.:backend_type
- Since the adapter uses ETS tables internally, this option is when creating a new table or generation. See:ets.new/2
options. The default value is:set
.:partitions
(pos_integer/0
) - The number of ETS partitions when using the:shards
backend. See:shards.new/2
.The default value is
System.schedulers_online()
.:purge_chunk_size
(pos_integer/0
) - This option limits the max nested match specs based on the number of keys when purging the older cache generation. The default value is100
.:gc_interval
(pos_integer/0
) - The interval time in milliseconds for garbage collection to run, create a new generation, make it the newer one, make the previous new generation the old one, and finally remove the previous old one. If not provided (ornil
), the garbage collection never runs, so new generations must be created explicitly, e.g.,MyCache.new_generation(opts)
(the default); however, the adapter does not recommend this.Usage
Always provide the
:gc_interval
option so the garbage collector can work appropriately out of the box. Unless you explicitly want to turn off the garbage collection or handle it yourself.:max_size
(pos_integer/0
) - The maximum number of entries to store in the cache. If not provided (ornil
), the health check to validate and release memory is not performed (the default).:allocated_memory
(pos_integer/0
) - The maximum size in bytes for the cache storage. If not provided (ornil
), the health check to validate and release memory is not performed (the default).:gc_memory_check_interval
(mem_check_interval/0
) - The interval time in milliseconds for garbage collection to run the size and memory checks.Usage
Beware: For the
:gc_memory_check_interval
option to work, you must configure one of:max_size
or:allocated_memory
(or both).The default value is
10000
.:gc_flush_delay
(pos_integer/0
) - The delay in milliseconds before objects from the oldest generation are flushed. The default value is10000
.
Usage
Nebulex.Cache
is the wrapper around the cache. We can define a
local cache as follows:
defmodule MyApp.LocalCache do
use Nebulex.Cache,
otp_app: :my_app,
adapter: Nebulex.Adapters.Local
end
Where the configuration for the cache must be in your application
environment, usually defined in your config/config.exs
:
config :my_app, MyApp.LocalCache,
gc_interval: :timer.hours(12),
max_size: 1_000_000,
allocated_memory: 2_000_000_000,
gc_memory_check_interval: :timer.seconds(10)
For intensive workloads, the Cache may also be partitioned using :shards
as cache backend (backend: :shards
) and configuring the desired number of
partitions via the :partitions
option. Defaults to
System.schedulers_online()
.
config :my_app, MyApp.LocalCache,
backend: :shards,
gc_interval: :timer.hours(12),
max_size: 1_000_000,
allocated_memory: 2_000_000_000,
gc_memory_check_interval: :timer.seconds(10)
partitions: System.schedulers_online() * 2
If your application was generated with a supervisor (by passing --sup
to mix new
) you will have a lib/my_app/application.ex
file containing
the application start callback that defines and starts your supervisor.
You just need to edit the start/2
function to start the cache as a
supervisor on your application's supervisor:
def start(_type, _args) do
children = [
{MyApp.LocalCache, []},
...
]
See Nebulex.Cache
for more information.
The :ttl
option
The :ttl
is a runtime option meant to set a key's expiration time. It is
evaluated on-demand when a key is retrieved, and if it has expired, it is
removed from the cache. Hence, it can not be used as an eviction method;
it is more for maintaining the cache's integrity and consistency. For this
reason, you should always configure the eviction or GC options. See the
"Eviction policy" section for more information.
Caveats when using :ttl
option:
- When using the
:ttl
option, ensure it is less than:gc_interval
. Otherwise, the key may be evicted, and the:ttl
hasn't happened yet because the garbage collector may run before a fetch operation has evaluated the:ttl
and expired the key. - Consider the following scenario based on the previous caveat. You have
:gc_interval
set to 1 hrs. Then you put a new key with:ttl
set to 2 hrs. One minute later, the GC runs, creating a new generation, and the key ends up in the older generation. Therefore, if the next GC cycle occurs (1 hr later) before the key is fetched (moving it to the newer generation), it is evicted from the cache when the GC removes the older generation so it won't be retrievable anymore.
Eviction policy
This adapter implements a generational cache, which means its primary eviction mechanism pushes a new cache generation and removes the oldest one. This mechanism ensures the garbage collector removes the least frequently used keys when it runs and deletes the oldest generation. At the same time, only the most frequently used keys are always available in the newer generation. In other words, the generation cache also enforces an LRU (Least Recently Used) eviction policy.
The following conditions trigger the garbage collector to run:
When the time interval defined by
:gc_interval
is completed. This makes the garbage-collector process to run creating a new generation and forcing to delete the oldest one. This interval defines how often you want to evict the least frequently used entries or the retention period for the cached entries. The retention period for the least frequently used entries is equivalent to two garbage collection cycles (since we keep two generations), which means the GC removes all entries not accessed in the cache during that time.When the time interval defined by
:gc_memory_check_interval
is completed. Beware: This option works alongside the:max_size
and:allocated_memory
options. The interval defines when the GC must run to validate the cache size and memory and release space if any of the limits are exceeded. It is mainly for keeping the cached data under the configured memory size limits and avoiding running out of memory at some point.
Configuring the GC options
This section helps you understand how the different configuration options work and gives you an idea of what values to set, especially if this is your first time using Nebulex with the local adapter.
Understanding a few things in advance is essential to configure the cache with appropriate values. For example, the average size of an entry so we can configure a reasonable value for the max size or allocated memory. Also, the reads and writes load. The problem is that sometimes it is challenging to have this information in advance, especially when it is a new app or when we use the cache for the first time. The following are tips to help you to configure the cache (especially if it is your for the first time):
To configure the GC, consider the retention period for the least frequently used entries you desire. For example, if the GC is 1 hr, you will keep only those entries accessed periodically during the last 2 hrs (two GC cycles, as outlined above). If it is your first time using the local adapter, you may start configuring the
:gc_interval
to 12 hrs to ensure daily data retention. Then, you can analyze the data and change the value based on your findings.Configure the
:max_size
or:allocated_memory
option (or both) to keep memory healthy under the given limits (avoid running out of memory). Configuring these options will ensure the GC releases memory space whenever a limit is reached or exceeded. For example, one may assign 50% of the total memory to the:allocated_memory
. It depends on how much memory you need and how much your app needs to run. For the:max_size
, consider how many entries you expect to keep in the cache; you could start with something between100_000
and1_000_000
.Finally, when configuring
:max_size
or:allocated_memory
(or both), you must also configure:gc_memory_check_interval
(defaults to 10 sec). By default, the GC will run every 10 seconds to validate the cache size and memory.
Queryable API
Since the adapter implementation uses ETS tables underneath, the query must be a valid ETS Match Spec. However, there are some predefined or shorthand queries you can use. See the "Predefined queries" section for information.
The adapter defines an entry as a tuple {:entry, key, value, touched, ttl}
,
meaning the match pattern within the ETS Match Spec must be like
{:entry, :"$1", :"$2", :"$3", :"$4"}
. To make query building easier,
you can use the Ex2ms
library.
iex> match_spec = [
...> {
...> {:entry, :"$1", :"$2", :_, :_},
...> [{:>, :"$2", 1}],
...> [{{:"$1", :"$2"}}]
...> }
...> ]
iex> MyCache.get_all(query: match_spec)
{:ok, [b: 1, c: 3]}
You can use the
Ex2ms
orMatchSpec
library to build queries easier.
Transaction API
This adapter inherits the default implementation provided by
Nebulex.Adapter.Transaction
. Therefore, the transaction
command accepts
the following options:
:keys
(list ofterm/0
) - The list of keys the transaction will lock. Since the lock ID is generated based on the key, the transaction uses a fixed lock ID if the option is not provided or is an empty list. Then, all subsequent transactions without this option (or set to an empty list) are serialized, and performance is significantly affected. For that reason, it is recommended to pass the list of keys involved in the transaction. The default value is[]
.:nodes
(list ofatom/0
) - The list of the nodes where to set the lock.The default value is
[node()]
.:retries
(:infinity
|non_neg_integer/0
) - If the key has already been locked by another process and retries are not equal to 0, the process sleeps for a while and tries to execute the action later. When:retries
attempts have been made, an exception is raised. If:retries
is:infinity
(the default), the function will eventually be executed (unless the lock is never released). The default value is:infinity
.
Extended API (extra functions)
This adapter provides some additional convenience functions to the
Nebulex.Cache
API.
Creating new generations:
MyCache.new_generation()
MyCache.new_generation(gc_interval_reset: false)
Retrieving the current generations:
MyCache.generations()
Retrieving the newer generation:
MyCache.newer_generation()
Summary
Types
@type backend() :: :ets | :shards
Adapter's backend type
@type mem_check_interval() :: pos_integer() | (limit :: :size | :memory, current :: non_neg_integer(), max :: non_neg_integer() -> timeout :: pos_integer())
The type for the :gc_memory_check_interval
option value.
The :gc_memory_check_interval
value can be:
- A positive integer with the time in milliseconds.
- An anonymous function to call in runtime and must return the next interval
in milliseconds. The function receives three arguments:
- The first argument is an atom indicating the limit, whether it is
:size
or:memory
. - The second argument is the current value for the limit. For example,
if the limit in the first argument is
:size
, the second argument tells the current cache size (number of entries in the cache). If the limit is:memory
, it means the recent cache memory in bytes. - The third argument is the maximum limit provided in the configuration.
When the limit in the first argument is
:size
, it is the:max_size
. On the other hand, if the limit is:memory
, it is the:allocated_memory
.
- The first argument is an atom indicating the limit, whether it is