Retry configuration and error classification for the command path.
The retry driver consumes a t/0 value per command and decides, based on
the classification helpers below, whether to re-dispatch an attempt against
the next replica (on a rebalance-class error), re-dispatch against a fresh
pool worker (on a transport-class error), or return the error verbatim (on
anything else).
Writer discipline
The retry policy is cluster-scoped, not per-node, and is established
once at Aerospike.start_link/1 time. The Tender computes the
effective policy and publishes it to the :meta ETS table under the key
:retry_opts; the command path reads it lock-free via load/1. Runtime
mutation still stays behind the single-writer boundary that governs every
other published :meta entry.
Per-command overrides (:timeout, :max_retries,
:sleep_between_retries_ms, :replica_policy) may be passed through
Aerospike.get/3's opts and are merged on top of the cluster default
by merge/2.
Classification
One canonical classifier drives the retry loop and the pool-side failure accounting. It returns:
:bucket— one of:ok,:rebalance,:transport,:routing_refusal,:server_fatal:retry_classification— the retry telemetry label ornil:close_connection?— whether the current worker should be discarded after the outcome:node_failure?— whether the outcome should increment the node's:failedcounter
The buckets stay disjoint:
rebalance — the server replied with a result code that says "this partition is not mine right now" (currently
:partition_unavailable). The retry driver re-picks on a different replica and asynchronously asks the Tender for a fresh partition map.transport — the command did not reach a server that answered cleanly:
:network_error,:timeout,:connection_error(socket),:pool_timeout,:invalid_node(pool checkout), and:circuit_open(circuit-breaker refusal). These are not ownership signals; the retry driver re-dispatches without asking for a map refresh.routing_refusal — the router refused to select a replica (
:cluster_not_ready,:no_master). The driver returns the atom verbatim; no retry.server_fatal — everything else: server logical errors (
:key_not_found,:generation_error, …) and client-local fatal errors like:parse_error. The driver returns these verbatim.
Summary
Types
High-level outcome bucket used by retry and pool-failure logic.
Complete retry classification for one command outcome.
Retry option accepted by from_opts/1 and merge/2.
Keyword list of retry options.
Replica selection policy used when retrying read commands.
Telemetry retry label derived from a command outcome.
Effective retry policy for one command.
Functions
Classifies one command outcome into retry buckets and the metadata the retry and pool layers consume.
Returns the default retry policy. Used by the Tender at init.
Builds an effective retry policy by overlaying the keyword opts on
top of defaults/0.
Reads the cluster-default retry policy from the :meta ETS table.
Overlays per-command opts on top of base. Only the three retry
fields are recognised; other keys are ignored.
Returns true when term should increment the node's :failed
counter.
Writes policy to meta_tab under the ETS key used by load/1.
Returns true when term is an error the retry driver should treat
as a cluster-rebalance signal. Accepts either a bare %Aerospike.Error{}
or the {:error, _} tuple form the command path produces; delegates to
the canonical classifier above.
Returns the retry telemetry label for term, or nil when the
outcome is fatal / non-retryable.
Returns true when term is an error the retry driver should treat
as a transport-class failure (re-dispatch without re-routing logic
beyond the replica walk).
Types
@type bucket() :: :ok | :rebalance | :transport | :routing_refusal | :server_fatal
High-level outcome bucket used by retry and pool-failure logic.
Buckets are intentionally disjoint so the retry driver can decide whether to retry, refresh cluster state, close a socket, or return the error as-is.
@type classification() :: %{ bucket: bucket(), retry_classification: retry_classification(), close_connection?: boolean(), node_failure?: boolean() }
Complete retry classification for one command outcome.
@type option() :: {:max_retries, non_neg_integer()} | {:sleep_between_retries_ms, non_neg_integer()} | {:replica_policy, replica_policy()} | {atom(), term()}
Retry option accepted by from_opts/1 and merge/2.
:max_retries— retries after the initial attempt.0disables retry.:sleep_between_retries_ms— fixed delay between attempts.:replica_policy—:masteror:sequence.- any other atom key — accepted and ignored.
Unknown keys are ignored so retry options can be merged from broader command/startup option lists.
@type options() :: [option()]
Keyword list of retry options.
@type replica_policy() :: :master | :sequence
Replica selection policy used when retrying read commands.
@type retry_classification() :: :rebalance | :transport | :circuit_open | nil
Telemetry retry label derived from a command outcome.
nil means the outcome is not retryable.
@type t() :: %{ max_retries: non_neg_integer(), sleep_between_retries_ms: non_neg_integer(), replica_policy: replica_policy() }
Effective retry policy for one command.
:max_retries— number of retries after the initial attempt (so a:max_retriesof2means up to 3 attempts total). Must be a non-negative integer.0disables retry entirely.:sleep_between_retries_ms— fixed delay between attempts; no jitter or exponential backoff.:replica_policy—:masterdispatches every attempt against the master replica (transport failures retry the same node);:sequencewalks the replica list viarem(attempt, length(replicas))on each retry.
Functions
@spec classify(term()) :: classification()
Classifies one command outcome into retry buckets and the metadata the retry and pool layers consume.
@spec defaults() :: t()
Returns the default retry policy. Used by the Tender at init.
Builds an effective retry policy by overlaying the keyword opts on
top of defaults/0.
Intended for the Tender's init path: validate the caller's start opts
once and store the resulting map in :meta. Unknown keys are ignored
so the retry policy can live alongside future policy knobs without a
config migration.
Reads the cluster-default retry policy from the :meta ETS table.
Falls back to defaults/0 when the slot is absent so readers never
crash against a Tender that was started without the retry plumbing
(a cluster-state-only test harness, for example, that skips the
retry-opts init).
Overlays per-command opts on top of base. Only the three retry
fields are recognised; other keys are ignored.
Returns true when term should increment the node's :failed
counter.
Writes policy to meta_tab under the ETS key used by load/1.
Runtime publication flows through the cluster-state writer; table creation also uses this helper once to seed the default row before the tend-cycle worker starts.
Returns true when term is an error the retry driver should treat
as a cluster-rebalance signal. Accepts either a bare %Aerospike.Error{}
or the {:error, _} tuple form the command path produces; delegates to
the canonical classifier above.
@spec retry_classification(term()) :: retry_classification()
Returns the retry telemetry label for term, or nil when the
outcome is fatal / non-retryable.
Returns true when term is an error the retry driver should treat
as a transport-class failure (re-dispatch without re-routing logic
beyond the replica walk).
Examples of transport-class codes: :network_error, :timeout,
:connection_error, :pool_timeout, :invalid_node, :circuit_open.