Nebulex.Adapters.Local (nebulex_local v3.0.0-rc.1)

View Source

A Local Generation Cache adapter for Nebulex; inspired by epocxy cache.

Generational caching using an ETS table (or multiple ones when used with :shards) for each generation of cached data. Accesses hit the newer generation first, and migrate from the older generation to the newer generation when retrieved from the stale table. When a new generation is started, the oldest one is deleted. This is a form of mass garbage collection which avoids using timers and expiration of individual cached elements.

This implementation of generation cache uses only two generations, referred to as the new and the old generation.

See Nebulex.Adapters.Local.Generation to learn more about generation management and garbage collection.

Overall features

  • Configurable backend (ets or :shards).
  • Expiration - A status based on TTL (Time To Live) option. To maintain cache performance, expired entries may not be immediately removed or evicted, they are expired or evicted on-demand, when the key is read.
  • Eviction - Generational Garbage Collection.
  • Sharding - For intensive workloads, the Cache may also be partitioned (by using :shards backend and specifying the :partitions option).
  • Support for transactions via Erlang global name registration facility. See Nebulex.Adapter.Transaction.
  • Support for stats.

Configuration options

The following options can be used to configure the adapter:

  • :cache (atom/0) - Required. The defined cache module.

  • :stats (boolean/0) - A flag to determine whether to collect cache stats. The default value is true.

  • :backend (backend/0) - The backend or storage to be used for the adapter. The default value is :ets.

  • :read_concurrency (boolean/0) - Since the adapter uses ETS tables internally, this option is when creating a new table or generation. See :ets.new/2 options. The default value is true.

  • :write_concurrency (boolean/0) - Since the adapter uses ETS tables internally, this option is when creating a new table or generation. See :ets.new/2 options. The default value is true.

  • :compressed (boolean/0) - Since the adapter uses ETS tables internally, this option is when creating a new table or generation. See :ets.new/2 options. The default value is false.

  • :backend_type - Since the adapter uses ETS tables internally, this option is when creating a new table or generation. See :ets.new/2 options. The default value is :set.

  • :partitions (pos_integer/0) - The number of ETS partitions when using the :shards backend. See :shards.new/2.

    The default value is System.schedulers_online().

  • :purge_chunk_size (pos_integer/0) - This option limits the max nested match specs based on the number of keys when purging the older cache generation. The default value is 100.

  • :gc_interval (pos_integer/0) - The interval time in milliseconds for garbage collection to run, create a new generation, make it the newer one, make the previous new generation the old one, and finally remove the previous old one. If not provided (or nil), the garbage collection never runs, so new generations must be created explicitly, e.g., MyCache.new_generation(opts) (the default); however, the adapter does not recommend this.

    Usage

    Always provide the :gc_interval option so the garbage collector can work appropriately out of the box. Unless you explicitly want to turn off the garbage collection or handle it yourself.

  • :max_size (pos_integer/0) - The maximum number of entries to store in the cache. If not provided (or nil), the health check to validate and release memory is not performed (the default).

  • :allocated_memory (pos_integer/0) - The maximum size in bytes for the cache storage. If not provided (or nil), the health check to validate and release memory is not performed (the default).

  • :gc_memory_check_interval (mem_check_interval/0) - The interval time in milliseconds for garbage collection to run the size and memory checks.

    Usage

    Beware: For the :gc_memory_check_interval option to work, you must configure one of :max_size or :allocated_memory (or both).

    The default value is 10000.

  • :gc_flush_delay (pos_integer/0) - The delay in milliseconds before objects from the oldest generation are flushed. The default value is 10000.

Usage

Nebulex.Cache is the wrapper around the cache. We can define a local cache as follows:

defmodule MyApp.LocalCache do
  use Nebulex.Cache,
    otp_app: :my_app,
    adapter: Nebulex.Adapters.Local
end

Where the configuration for the cache must be in your application environment, usually defined in your config/config.exs:

config :my_app, MyApp.LocalCache,
  gc_interval: :timer.hours(12),
  max_size: 1_000_000,
  allocated_memory: 2_000_000_000,
  gc_memory_check_interval: :timer.seconds(10)

For intensive workloads, the Cache may also be partitioned using :shards as cache backend (backend: :shards) and configuring the desired number of partitions via the :partitions option. Defaults to System.schedulers_online().

config :my_app, MyApp.LocalCache,
  backend: :shards,
  gc_interval: :timer.hours(12),
  max_size: 1_000_000,
  allocated_memory: 2_000_000_000,
  gc_memory_check_interval: :timer.seconds(10)
  partitions: System.schedulers_online() * 2

If your application was generated with a supervisor (by passing --sup to mix new) you will have a lib/my_app/application.ex file containing the application start callback that defines and starts your supervisor. You just need to edit the start/2 function to start the cache as a supervisor on your application's supervisor:

def start(_type, _args) do
  children = [
    {MyApp.LocalCache, []},
    ...
  ]

See Nebulex.Cache for more information.

The :ttl option

The :ttl is a runtime option meant to set a key's expiration time. It is evaluated on-demand when a key is retrieved, and if it has expired, it is removed from the cache. Hence, it can not be used as an eviction method; it is more for maintaining the cache's integrity and consistency. For this reason, you should always configure the eviction or GC options. See the "Eviction policy" section for more information.

Caveats when using :ttl option:

  • When using the :ttl option, ensure it is less than :gc_interval. Otherwise, the key may be evicted, and the :ttl hasn't happened yet because the garbage collector may run before a fetch operation has evaluated the :ttl and expired the key.
  • Consider the following scenario based on the previous caveat. You have :gc_interval set to 1 hrs. Then you put a new key with :ttl set to 2 hrs. One minute later, the GC runs, creating a new generation, and the key ends up in the older generation. Therefore, if the next GC cycle occurs (1 hr later) before the key is fetched (moving it to the newer generation), it is evicted from the cache when the GC removes the older generation so it won't be retrievable anymore.

Eviction policy

This adapter implements a generational cache, which means its primary eviction mechanism pushes a new cache generation and removes the oldest one. This mechanism ensures the garbage collector removes the least frequently used keys when it runs and deletes the oldest generation. At the same time, only the most frequently used keys are always available in the newer generation. In other words, the generation cache also enforces an LRU (Least Recently Used) eviction policy.

The following conditions trigger the garbage collector to run:

  • When the time interval defined by :gc_interval is completed. This makes the garbage-collector process to run creating a new generation and forcing to delete the oldest one. This interval defines how often you want to evict the least frequently used entries or the retention period for the cached entries. The retention period for the least frequently used entries is equivalent to two garbage collection cycles (since we keep two generations), which means the GC removes all entries not accessed in the cache during that time.

  • When the time interval defined by :gc_memory_check_interval is completed. Beware: This option works alongside the :max_size and :allocated_memory options. The interval defines when the GC must run to validate the cache size and memory and release space if any of the limits are exceeded. It is mainly for keeping the cached data under the configured memory size limits and avoiding running out of memory at some point.

Configuring the GC options

This section helps you understand how the different configuration options work and gives you an idea of what values to set, especially if this is your first time using Nebulex with the local adapter.

Understanding a few things in advance is essential to configure the cache with appropriate values. For example, the average size of an entry so we can configure a reasonable value for the max size or allocated memory. Also, the reads and writes load. The problem is that sometimes it is challenging to have this information in advance, especially when it is a new app or when we use the cache for the first time. The following are tips to help you to configure the cache (especially if it is your for the first time):

  • To configure the GC, consider the retention period for the least frequently used entries you desire. For example, if the GC is 1 hr, you will keep only those entries accessed periodically during the last 2 hrs (two GC cycles, as outlined above). If it is your first time using the local adapter, you may start configuring the :gc_interval to 12 hrs to ensure daily data retention. Then, you can analyze the data and change the value based on your findings.

  • Configure the :max_size or :allocated_memory option (or both) to keep memory healthy under the given limits (avoid running out of memory). Configuring these options will ensure the GC releases memory space whenever a limit is reached or exceeded. For example, one may assign 50% of the total memory to the :allocated_memory. It depends on how much memory you need and how much your app needs to run. For the :max_size, consider how many entries you expect to keep in the cache; you could start with something between 100_000 and 1_000_000.

  • Finally, when configuring :max_size or :allocated_memory (or both), you must also configure :gc_memory_check_interval (defaults to 10 sec). By default, the GC will run every 10 seconds to validate the cache size and memory.

Queryable API

Since the adapter implementation uses ETS tables underneath, the query must be a valid ETS Match Spec. However, there are some predefined or shorthand queries you can use. See the "Predefined queries" section for information.

The adapter defines an entry as a tuple {:entry, key, value, touched, ttl}, meaning the match pattern within the ETS Match Spec must be like {:entry, :"$1", :"$2", :"$3", :"$4"}. To make query building easier, you can use the Ex2ms library.

iex> match_spec = [
...>   {
...>     {:entry, :"$1", :"$2", :_, :_},
...>     [{:>, :"$2", 1}],
...>     [{{:"$1", :"$2"}}]
...>   }
...> ]
iex> MyCache.get_all(query: match_spec)
{:ok, [b: 1, c: 3]}

You can use the Ex2ms or MatchSpec library to build queries easier.

Transaction API

This adapter inherits the default implementation provided by Nebulex.Adapter.Transaction. Therefore, the transaction command accepts the following options:

  • :keys (list of term/0) - The list of keys the transaction will lock. Since the lock ID is generated based on the key, the transaction uses a fixed lock ID if the option is not provided or is an empty list. Then, all subsequent transactions without this option (or set to an empty list) are serialized, and performance is significantly affected. For that reason, it is recommended to pass the list of keys involved in the transaction. The default value is [].

  • :nodes (list of atom/0) - The list of the nodes where to set the lock.

    The default value is [node()].

  • :retries (:infinity | non_neg_integer/0) - If the key has already been locked by another process and retries are not equal to 0, the process sleeps for a while and tries to execute the action later. When :retries attempts have been made, an exception is raised. If :retries is :infinity (the default), the function will eventually be executed (unless the lock is never released). The default value is :infinity.

Extended API (extra functions)

This adapter provides some additional convenience functions to the Nebulex.Cache API.

Creating new generations:

MyCache.new_generation()
MyCache.new_generation(gc_interval_reset: false)

Retrieving the current generations:

MyCache.generations()

Retrieving the newer generation:

MyCache.newer_generation()

Summary

Types

Adapter's backend type

The type for the :gc_memory_check_interval option value.

Types

backend()

@type backend() :: :ets | :shards

Adapter's backend type

mem_check_interval()

@type mem_check_interval() ::
  pos_integer()
  | (limit :: :size | :memory,
     current :: non_neg_integer(),
     max :: non_neg_integer() ->
       timeout :: pos_integer())

The type for the :gc_memory_check_interval option value.

The :gc_memory_check_interval value can be:

  • A positive integer with the time in milliseconds.
  • An anonymous function to call in runtime and must return the next interval in milliseconds. The function receives three arguments:
    • The first argument is an atom indicating the limit, whether it is :size or :memory.
    • The second argument is the current value for the limit. For example, if the limit in the first argument is :size, the second argument tells the current cache size (number of entries in the cache). If the limit is :memory, it means the recent cache memory in bytes.
    • The third argument is the maximum limit provided in the configuration. When the limit in the first argument is :size, it is the :max_size. On the other hand, if the limit is :memory, it is the :allocated_memory.

Functions

entry(args \\ [])

(macro)

entry(record, args)

(macro)