Evaluator and validator for the gettext/CLDR plural rules used by erli18n.
Compiles the C expression from a .po Plural-Forms: header
(nplurals=N; plural=EXPR;) into a small AST and evaluates it to choose
the plural form for a given N — this is what backs ngettext/npgettext.
The problem it solves
gettext selects the correct plural translation by evaluating a C
EXPRESSION embedded in the .po header (e.g. the Russian 3-form rule
n%10==1 && n%100!=11 ? 0 : ...). Each locale ships its own. This module
replaces the legacy gettexter's Yecc/Leex/erl_eval pipeline with a
hand-written recursive-descent parser + AST interpreter (no dynamic
generation of Erlang code, so dialyzer/eqwalizer can reason about
everything). It turns EXPR into a ast/0 and evaluates it to a form
index in [0, NPlurals).
Mental model
- Two phases.
compile/1(load-time, cold) parses + validates + packs into aplural_compiled/0.evaluate/2(lookup-time, HOT PATH) interprets that bundle per call. The catalog loader compiles ONCE and keeps the bundle; eachngettext/npgettextcalls onlyevaluate/2. - Runtime source-of-truth is the
.poheader (PSD-004). The embedded CLDR table (~49 locales,cldr_rule/1) does NOT take part in the hot path: it is consulted only at load time to emit divergence warnings (validate_against_cldr/2) and as a fallback when the header is missing (fallback_rule/0). - Trusted vs untrusted. The header expression comes from a tenant's
.po— UNTRUSTED input (ADR-0003, seeSECURITY.md). Thecldr_data/0table is a static module literal — TRUSTED. That is whycompile/1is fail-closed and hardened, whilecldr_compiled_table/0assumes every row compiles. - Pure function, no per-process state. Unlike the catalog server, this
module has no gen_server, no ETS and no process dictionary. The only
side effect is a global read-once cache in
persistent_term(cldr_compiled_table/0), memoising the compiled CLDR ASTs — a module-scoped singleton under a fixed key, built once per node and never invalidated (cldr_data/0is constant).
Anti-DoS hardening (ADR-0003)
The attack surface is the .po expression. The defenses ALL live in
compile/1 (cold), so that evaluate/2 (hot) stays O(1)-bounded by
construction:
?PLURAL_EXPR_MAX_BYTES(2048) — rejects a long expression before parse.?PLURAL_EXPR_MAX_DEPTH(64) — bounds nesting (and the walker's stack).?AST_MAX_NODES(256) — bounds the node count (a wide flat chainn*n*...*npasses both caps above but would grow ann^kbignum per lookup).?MAX_INT_DIGITS(7) — bounds the digits ofnplurals=beforebinary_to_integermaterialises the bignum.- Static rejection (
validate_safe/2) — refuses rules provably faulty for EVERY N (div/mod by a constant divisor of 0; constant outside[0, NPlurals)).
evaluate/2 is TOTAL: it never raises. Mirroring the GNU libintl runtime
(dcigettext.c), division/modulo by zero is coerced to 0 and a form
outside [0, NPlurals) is clamped to 0. Anyone who needs to OBSERVE the
anomaly (log/alert) uses evaluate_checked/2, which returns it as data.
When you touch this module
- Consumer: almost never directly — you call
erli18n:ngettext/5and the catalog server takes care ofcompile/1/evaluate/2. For a quick test outside the server,plural_by_po_header/2compiles and evaluates in one step. - Loader maintainer: calls
compile/1at load, keeps the bundle, and on the hot path callsevaluate/2. For CLDR divergence at load usevalidate_against_cldr_ast/2(reuses the already-compiled AST). - CLDR table maintainer: edits
cldr_data/0when syncing a CLDR release.
Quickstart
%% Compile the Russian 3-form (one/few/many) rule once...
1> Hdr = <<"nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : "
1> "n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : 2;">>.
2> {ok, C} = erli18n_plural:compile(Hdr).
{ok,#{raw => Hdr,expr => {ternary,_,_,_},nplurals => 3}}
%% ...and select the form for various N (hot path).
3> erli18n_plural:evaluate(C, 1).
0
4> erli18n_plural:evaluate(C, 2).
1
5> erli18n_plural:evaluate(C, 5).
2
%% One-off use: compile and evaluate at once.
6> erli18n_plural:plural_by_po_header(<<"nplurals=2; plural=n != 1;">>, 1).
{ok, 0}Key functions
compile/1— parse + fail-closed validation →plural_compiled/0.evaluate/2— hot path, total, returns the form index.evaluate_checked/2— structured sibling that reports anomalies as data.plural_by_po_header/2— compile+evaluate shortcut for one-off use.cldr_rule/1/validate_against_cldr/2/validate_against_cldr_ast/2— CLDR observability (off the hot path).fallback_rule/0— Germanic default when the header is missing.
Summary
Types
AST of the plural expression — a literal integer, the variable n, a
binop ({binop, t:op/0, Left, Right}), the negation unop
({unop, '!', _}) or a ternary ({ternary, Cond, Then, Else}).
Structural failure reason from compile/1 — always fail-closed, never an
exception.
Binary operators accepted in a ast/0, with C precedence/associativity:
arithmetic (+ - * / %), relational (< > <= >=), equality (== !=) and
short-circuit logical (&& ||). % uses rem (truncates toward zero,
like C99); / and % by zero are coerced to 0 on the hot path.
Compiled plural-rule bundle — the output of compile/1 and the input to
evaluate/2/evaluate_checked/2.
Anomaly observed while evaluating a compiled rule — returned as data, never raised.
Functions
Looks up the CLDR canonical plural expression for Locale in the embedded
table.
Compiles a .po Plural-Forms: header expression into a
plural_compiled() bundle (a nplurals/expr/raw map) reused by each
evaluate/2.
Evaluates a compiled plural rule for a given N and returns the plural
form index — the TOTAL hot-path function, used by every
ngettext/npgettext.
Structured sibling of evaluate/2: instead of clamping silently, it
reports a malformed rule as data so the consumer can log/alert.
Fallback plural rule used when a .po catalog ships no Plural-Forms:
header at all (a degenerate but tolerated input).
Convenience that compiles and evaluates in a single step: given the raw
header Header and the count N, returns {ok, Form} or propagates the
{error, compile_error()} from compile/1.
Compares the plural expression of header HeaderRule (raw form) against
the CLDR canonical rule of Locale, producing only observability — at
runtime the header always wins (PSD-004).
AST-based variant of validate_against_cldr/2: takes the ALREADY compiled
bundle (plural_compiled()) and compares it against the CLDR rule of
Locale without recompiling anything (finding #17).
Types
-type ast() :: integer() | n | {binop, op(), ast(), ast()} | {unop, '!', ast()} | {ternary, ast(), ast(), ast()}.
AST of the plural expression — a literal integer, the variable n, a
binop ({binop, t:op/0, Left, Right}), the negation unop
({unop, '!', _}) or a ternary ({ternary, Cond, Then, Else}).
It is the tree that compile/1 builds and that evaluate/2/eval_ast/2
interpret. The depth is bounded by ?PLURAL_EXPR_MAX_DEPTH and the node
count by ?AST_MAX_NODES, so no valid instance is arbitrarily large.
-type compile_error() :: {syntax_error, Reason :: term(), Position :: non_neg_integer()} | {missing_nplurals, binary()} | {missing_plural_expr, binary()} | {nplurals_out_of_range, integer()} | {unsafe_plural_rule, plural_eval_error()} | {expr_too_long, Size :: non_neg_integer(), Max :: pos_integer()} | {expr_too_deep, Depth :: pos_integer(), Position :: non_neg_integer()} | {expr_too_complex, Nodes :: pos_integer(), Max :: pos_integer()} | {nplurals_too_many_digits, Digits :: pos_integer(), Max :: pos_integer()}.
Structural failure reason from compile/1 — always fail-closed, never an
exception.
Groups header defects (missing_nplurals, missing_plural_expr,
nplurals_out_of_range, syntax_error) and the anti-DoS hardening
rejections: expr_too_long/expr_too_deep/expr_too_complex
(byte/depth/node caps), nplurals_too_many_digits (digit cap before the
bignum) and unsafe_plural_rule (rule statically faulty for every N).
See compile/1 for what triggers each one.
-type op() :: '+' | '-' | '*' | '/' | '%' | '==' | '!=' | '<' | '>' | '<=' | '>=' | '&&' | '||'.
Binary operators accepted in a ast/0, with C precedence/associativity:
arithmetic (+ - * / %), relational (< > <= >=), equality (== !=) and
short-circuit logical (&& ||). % uses rem (truncates toward zero,
like C99); / and % by zero are coerced to 0 on the hot path.
-type plural_compiled() :: #{nplurals := pos_integer(), expr := ast(), raw := binary()}.
Compiled plural-rule bundle — the output of compile/1 and the input to
evaluate/2/evaluate_checked/2.
nplurals— how many plural forms the locale has (validated in[1, ?NPLURALS_MAX]); every returned index stays in[0, nplurals).expr— the parsedast/0of theplural=expression, evaluated on the hot path.raw— the originating raw header, preserved for diagnostics and for the divergence payload ofvalidate_against_cldr_ast/2.
Compile once at load and reuse this map on every lookup; there is no
result cache inside evaluate/2.
-type plural_eval_error() :: {division_by_zero, '/' | '%'} | {form_out_of_range, Form :: integer(), NPlurals :: pos_integer()}.
Anomaly observed while evaluating a compiled rule — returned as data, never raised.
{division_by_zero, '/' | '%'} when an evaluated divisor is 0;
{form_out_of_range, Form, NPlurals} when the index falls outside
[0, NPlurals). It appears as a return of evaluate_checked/2 and as the
payload of an {unsafe_plural_rule, _} rejected by compile/1. The total
evaluate/2 NEVER produces this — it clamps (parity with libintl).
Functions
Looks up the CLDR canonical plural expression for Locale in the embedded
table.
Returns {ok, Expr}, where Expr is the binary of the C plural
expression equivalent to that locale's CLDR rule, or undefined if
neither the locale nor its base language is in the table. The match is
case-sensitive; region tags fall back to the base language when the region
itself is not listed (e.g. fr_BE -> fr, since fr_BE has no row of
its own in the table).
A lookup/observability function — NOT on the hot path (PSD-004: the .po
header is the runtime source-of-truth). The embedded table (cldr_data/0)
covers ~49 locales. Both _ and - separators are accepted in the
fallback to the base language.
%% Direct hit: the entry exists in the table.
1> erli18n_plural:cldr_rule(<<"ru">>).
{ok,<<"n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : 2">>}
%% Direct hit: `fr_CA` HAS its own row, so it resolves without falling back.
2> erli18n_plural:cldr_rule(<<"fr_CA">>).
{ok,<<"n > 1">>}
%% Fallback to the base language: `fr_BE` is not in the table, falls back to `fr`.
3> erli18n_plural:cldr_rule(<<"fr_BE">>).
{ok,<<"n > 1">>}
%% Neither the locale nor the base exists.
4> erli18n_plural:cldr_rule(<<"xx">>).
undefinedEdge cases: pt_PT (n != 1) diverges from the base pt (n > 1), so
the region entry exists separately. See also validate_against_cldr/2
(compare a header against the CLDR rule) and fallback_rule/0.
-spec compile(binary()) -> {ok, plural_compiled()} | {error, compile_error()}.
Compiles a .po Plural-Forms: header expression into a
plural_compiled() bundle (a nplurals/expr/raw map) reused by each
evaluate/2.
Header is the header string (nplurals=N; plural=EXPR;); the fields are
located in a whitespace-tolerant way. Returns {ok, Compiled} or
{error, compile_error()}, always fail-closed (never raises), since it
runs over untrusted .po inside the gen_server's handle_call.
Relevant structural rejections:
{expr_too_long, Size, Max}— expression above?PLURAL_EXPR_MAX_BYTES(2048), refused before parsing;{expr_too_deep, Depth, Pos}— nesting above?PLURAL_EXPR_MAX_DEPTH(64);{expr_too_complex, Nodes, Max}— AST with more nodes than?AST_MAX_NODES(256), barring wide flat chains (n*n*...*n) that would grow a bignum per lookup;{unsafe_plural_rule, Reason}— STATICALLY faulty rule: division/modulo by a constant divisor of 0, or a constant rule whose form falls outside[0, NPlurals). Cases that fail only for a specific N are left to the dynamic clamp ofevaluate/2;{nplurals_too_many_digits, _, _},{nplurals_out_of_range, _},{missing_nplurals, _},{missing_plural_expr, _}and{syntax_error, Reason, Pos}for the remaining header defects.
Edge cases: redundant parentheses and whitespace are absorbed by the
parser; n is the ONLY allowed identifier (nx or m become a
syntax_error); degenerate rules plural=0 (ja/zh/ko/vi/th) compile as
an integer literal (PSD-008). A rule that fails only for a specific N
(e.g. n/(n-5)) is NOT rejected here — that is left to the dynamic clamp
of evaluate/2.
1> erli18n_plural:compile(<<"nplurals=2; plural=n != 1;">>).
{ok,#{raw => <<"nplurals=2; plural=n != 1;">>,expr => {binop,'!=',n,1},nplurals => 2}}
2> erli18n_plural:compile(<<"nplurals=1; plural=0;">>).
{ok,#{raw => <<"nplurals=1; plural=0;">>,expr => 0,nplurals => 1}}
3> erli18n_plural:compile(<<"nplurals=2; plural=n/0;">>).
{error,{unsafe_plural_rule,{division_by_zero,'/'}}}
4> erli18n_plural:compile(<<"nplurals=2; plural=nx;">>).
{error,{syntax_error,{unknown_identifier_after_n,$x},1}}
5> erli18n_plural:compile(<<"nplurals=2;">>).
{error,{missing_plural_expr,<<"nplurals=2;">>}}See also evaluate/2 (consume the bundle), plural_by_po_header/2
(compile+evaluate at once) and compile_error/0.
-spec evaluate(plural_compiled(), integer()) -> non_neg_integer().
Evaluates a compiled plural rule for a given N and returns the plural
form index — the TOTAL hot-path function, used by every
ngettext/npgettext.
Compiled is the bundle from compile/1; N is the count (an integer,
may be negative — the rule decides the semantics). The return is always a
non_neg_integer() in [0, NPlurals): the rule is interpreted and the
result coerced to an integer.
Never raises, even on a malformed rule (parity with GNU libintl):
division/modulo by zero is coerced to 0 (eval_div/2/eval_rem/2 instead
of letting div/rem raise badarith) and a form outside
[0, NPlurals) is clamped to 0 (if index >= nplurals -> index = 0).
No allocations beyond the return value and no result cache: the cost is
re-paying the AST interpretation on every call — which is why the
compile/1 caps keep the AST small. A negative N is passed through
without abs(); the rule decides the semantics (and the clamp protects
the result).
1> {ok, C} = erli18n_plural:compile(<<"nplurals=2; plural=n != 1;">>).
2> erli18n_plural:evaluate(C, 1).
0
3> erli18n_plural:evaluate(C, 5).
1
%% Divisor DEPENDS on n (passes compile/1's static check),
%% but evaluates to zero at runtime for N=7: clamp to 0, no crash.
4> {ok, Bad} = erli18n_plural:compile(<<"nplurals=2; plural=1/(n-7);">>).
5> erli18n_plural:evaluate(Bad, 7).
0Edge cases: the short-circuit of &&/|| is honoured, so a zero divisor
behind a false branch is never reached. To OBSERVE the anomaly (instead of
silent clamping) use evaluate_checked/2. See also compile/1 and
plural_by_po_header/2.
-spec evaluate_checked(plural_compiled(), integer()) -> {ok, non_neg_integer()} | {error, plural_eval_error()}.
Structured sibling of evaluate/2: instead of clamping silently, it
reports a malformed rule as data so the consumer can log/alert.
Compiled and N are as in evaluate/2. Returns {ok, Form} with the
form in [0, NPlurals), or {error, plural_eval_error()}:
{division_by_zero, '/' | '%'} when the evaluated divisor is 0, or
{form_out_of_range, Form, NPlurals} when the form leaves the range. It
keeps the short-circuit of &&/|| (a zero divisor behind a false branch
is not reported) and, like evaluate/2, is total — never raises.
Use this off the hot path, when you want to log/alert the malformed rule;
on the hot path stay with evaluate/2, whose clamp is cheaper. Where
evaluate/2 would return 0 by clamping, this function returns the
corresponding {error, _}.
1> {ok, C} = erli18n_plural:compile(<<"nplurals=2; plural=n != 1;">>).
2> erli18n_plural:evaluate_checked(C, 5).
{ok,1}
%% Same rule as the evaluate/2 example (divisor depends on n).
%% Where evaluate/2 would clamp to 0, here the anomaly comes back as data.
3> {ok, Bad} = erli18n_plural:compile(<<"nplurals=2; plural=1/(n-7);">>).
4> erli18n_plural:evaluate_checked(Bad, 7).
{error,{division_by_zero,'/'}}
5> erli18n_plural:evaluate_checked(Bad, 8).
{ok,1}Edge cases: a form outside [0, NPlurals) (where evaluate/2 would
clamp) becomes {error, {form_out_of_range, Form, NPlurals}}. See also
evaluate/2 (the sibling that clamps) and plural_eval_error/0.
-spec fallback_rule() -> binary().
Fallback plural rule used when a .po catalog ships no Plural-Forms:
header at all (a degenerate but tolerated input).
Returns <<"nplurals=2; plural=n != 1;">> — the Germanic C/English
default cited by the GNU gettext manual (§"Plural forms").
A pure constant, no side effects. The result is a raw header ready for
compile/1, so the loader's fallback path reuses exactly the same
pipeline as a legitimate header.
1> erli18n_plural:fallback_rule().
<<"nplurals=2; plural=n != 1;">>
2> {ok, C} = erli18n_plural:compile(erli18n_plural:fallback_rule()),
2> erli18n_plural:evaluate(C, 1).
0See also compile/1 and cldr_rule/1.
-spec plural_by_po_header(binary(), integer()) -> {ok, non_neg_integer()} | {error, compile_error()}.
Convenience that compiles and evaluates in a single step: given the raw
header Header and the count N, returns {ok, Form} or propagates the
{error, compile_error()} from compile/1.
Recompiles on every call, so it is for one-off use; on the hot path, call
compile/1 once at load and reuse the bundle with evaluate/2.
The internal evaluation uses evaluate/2 (total), so an {ok, _} never
embeds an evaluation anomaly — the only part that can fail is compile/1,
whose error is propagated as-is.
1> erli18n_plural:plural_by_po_header(<<"nplurals=2; plural=n != 1;">>, 1).
{ok,0}
2> erli18n_plural:plural_by_po_header(<<"nplurals=2; plural=n != 1;">>, 3).
{ok,1}
3> erli18n_plural:plural_by_po_header(<<"nplurals=2; plural=nx;">>, 1).
{error,{syntax_error,{unknown_identifier_after_n,$x},1}}See also compile/1 and evaluate/2.
-spec validate_against_cldr(binary(), binary()) -> ok | {warning, {plural_divergence, binary(), binary(), binary()}}.
Compares the plural expression of header HeaderRule (raw form) against
the CLDR canonical rule of Locale, producing only observability — at
runtime the header always wins (PSD-004).
Compiles HeaderRule ONCE and delegates to validate_against_cldr_ast/2.
Returns ok when the (nplurals, expr) ASTs are structurally equal
(whitespace/paren-insensitive) or when the locale has no CLDR entry;
returns {warning, {plural_divergence, Locale, HeaderRule, CldrRaw}} when
they diverge — including when the header is invalid but the locale is
listed in CLDR.
A convenience entry point for callers that only have the raw header. The
catalog loader, which already keeps the compiled bundle, should use
validate_against_cldr_ast/2 to avoid recompiling the header at load.
The comparison is STRUCTURAL over the (nplurals, expr-AST) pair, so it
is insensitive to whitespace and redundant parentheses: (n != 1) matches
n != 1. Nothing changes at runtime — the warning exists only for
telemetry.
%% Header agrees with fr's CLDR (n > 1): no warning.
1> erli18n_plural:validate_against_cldr(<<"fr">>, <<"nplurals=2; plural=(n > 1);">>).
ok
%% Header diverges from fr's CLDR: warning (but the header would win at runtime).
2> erli18n_plural:validate_against_cldr(<<"fr">>, <<"nplurals=2; plural=n != 1;">>).
{warning,{plural_divergence,<<"fr">>,<<"nplurals=2; plural=n != 1;">>,<<"n > 1">>}}
%% Locale with no CLDR entry: nothing to validate.
3> erli18n_plural:validate_against_cldr(<<"xx">>, <<"nplurals=2; plural=n != 1;">>).
okEdge cases: an INVALID header against a locale that IS listed in CLDR
still produces {warning, _} (it cannot match the canonical rule);
against a locale with no CLDR entry it becomes ok. See also
validate_against_cldr_ast/2 (variant without recompiling) and
cldr_rule/1.
-spec validate_against_cldr_ast(binary(), plural_compiled()) -> ok | {warning, {plural_divergence, binary(), binary(), binary()}}.
AST-based variant of validate_against_cldr/2: takes the ALREADY compiled
bundle (plural_compiled()) and compares it against the CLDR rule of
Locale without recompiling anything (finding #17).
Reuses the header AST as-is and takes the CLDR side from a memoised table
of compiled bundles, so no rule is re-parsed at load. Returns ok if the
(nplurals, expr) pairs match or if the locale has no CLDR entry;
otherwise {warning, {plural_divergence, Locale, HeaderRaw, CldrRaw}},
with the raw header (the bundle's raw field) and the raw CLDR
expression.
This is the PREFERRED form in the loader (finding #17): since the bundle
was already compiled by compile/1 at load, it avoids the second
compile/1 that validate_against_cldr/2 would do, and the CLDR side
comes from the persistent_term cache (cldr_compiled_table/0), not
re-synthesised per load.
1> {ok, C} = erli18n_plural:compile(<<"nplurals=2; plural=n != 1;">>).
2> erli18n_plural:validate_against_cldr_ast(<<"fr">>, C).
{warning,{plural_divergence,<<"fr">>,<<"nplurals=2; plural=n != 1;">>,<<"n > 1">>}}
3> {ok, Cde} = erli18n_plural:compile(<<"nplurals=2; plural=n != 1;">>).
4> erli18n_plural:validate_against_cldr_ast(<<"de">>, Cde).
okEdge cases: a locale with no CLDR entry becomes ok (nothing to log). See
also validate_against_cldr/2 (from the raw header) and cldr_compiled/1
(the memoisation of the CLDR side).