erli18n_plural (erli18n v0.1.0)

Copy Markdown View Source

Evaluator and validator for the gettext/CLDR plural rules used by erli18n.

Compiles the C expression from a .po Plural-Forms: header (nplurals=N; plural=EXPR;) into a small AST and evaluates it to choose the plural form for a given N — this is what backs ngettext/npgettext.

The problem it solves

gettext selects the correct plural translation by evaluating a C EXPRESSION embedded in the .po header (e.g. the Russian 3-form rule n%10==1 && n%100!=11 ? 0 : ...). Each locale ships its own. This module replaces the legacy gettexter's Yecc/Leex/erl_eval pipeline with a hand-written recursive-descent parser + AST interpreter (no dynamic generation of Erlang code, so dialyzer/eqwalizer can reason about everything). It turns EXPR into a ast/0 and evaluates it to a form index in [0, NPlurals).

Mental model

  • Two phases. compile/1 (load-time, cold) parses + validates + packs into a plural_compiled/0. evaluate/2 (lookup-time, HOT PATH) interprets that bundle per call. The catalog loader compiles ONCE and keeps the bundle; each ngettext/npgettext calls only evaluate/2.
  • Runtime source-of-truth is the .po header (PSD-004). The embedded CLDR table (~49 locales, cldr_rule/1) does NOT take part in the hot path: it is consulted only at load time to emit divergence warnings (validate_against_cldr/2) and as a fallback when the header is missing (fallback_rule/0).
  • Trusted vs untrusted. The header expression comes from a tenant's .po — UNTRUSTED input (ADR-0003, see SECURITY.md). The cldr_data/0 table is a static module literal — TRUSTED. That is why compile/1 is fail-closed and hardened, while cldr_compiled_table/0 assumes every row compiles.
  • Pure function, no per-process state. Unlike the catalog server, this module has no gen_server, no ETS and no process dictionary. The only side effect is a global read-once cache in persistent_term (cldr_compiled_table/0), memoising the compiled CLDR ASTs — a module-scoped singleton under a fixed key, built once per node and never invalidated (cldr_data/0 is constant).

Anti-DoS hardening (ADR-0003)

The attack surface is the .po expression. The defenses ALL live in compile/1 (cold), so that evaluate/2 (hot) stays O(1)-bounded by construction:

  • ?PLURAL_EXPR_MAX_BYTES (2048) — rejects a long expression before parse.
  • ?PLURAL_EXPR_MAX_DEPTH (64) — bounds nesting (and the walker's stack).
  • ?AST_MAX_NODES (256) — bounds the node count (a wide flat chain n*n*...*n passes both caps above but would grow an n^k bignum per lookup).
  • ?MAX_INT_DIGITS (7) — bounds the digits of nplurals= before binary_to_integer materialises the bignum.
  • Static rejection (validate_safe/2) — refuses rules provably faulty for EVERY N (div/mod by a constant divisor of 0; constant outside [0, NPlurals)).

evaluate/2 is TOTAL: it never raises. Mirroring the GNU libintl runtime (dcigettext.c), division/modulo by zero is coerced to 0 and a form outside [0, NPlurals) is clamped to 0. Anyone who needs to OBSERVE the anomaly (log/alert) uses evaluate_checked/2, which returns it as data.

When you touch this module

Quickstart

%% Compile the Russian 3-form (one/few/many) rule once...
1> Hdr = <<"nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : "
1>        "n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : 2;">>.
2> {ok, C} = erli18n_plural:compile(Hdr).
{ok,#{raw => Hdr,expr => {ternary,_,_,_},nplurals => 3}}
%% ...and select the form for various N (hot path).
3> erli18n_plural:evaluate(C, 1).
0
4> erli18n_plural:evaluate(C, 2).
1
5> erli18n_plural:evaluate(C, 5).
2
%% One-off use: compile and evaluate at once.
6> erli18n_plural:plural_by_po_header(<<"nplurals=2; plural=n != 1;">>, 1).
{ok, 0}

Key functions

Summary

Types

AST of the plural expression — a literal integer, the variable n, a binop ({binop, t:op/0, Left, Right}), the negation unop ({unop, '!', _}) or a ternary ({ternary, Cond, Then, Else}).

Structural failure reason from compile/1 — always fail-closed, never an exception.

Binary operators accepted in a ast/0, with C precedence/associativity: arithmetic (+ - * / %), relational (< > <= >=), equality (== !=) and short-circuit logical (&& ||). % uses rem (truncates toward zero, like C99); / and % by zero are coerced to 0 on the hot path.

Compiled plural-rule bundle — the output of compile/1 and the input to evaluate/2/evaluate_checked/2.

Anomaly observed while evaluating a compiled rule — returned as data, never raised.

Functions

Looks up the CLDR canonical plural expression for Locale in the embedded table.

Compiles a .po Plural-Forms: header expression into a plural_compiled() bundle (a nplurals/expr/raw map) reused by each evaluate/2.

Evaluates a compiled plural rule for a given N and returns the plural form index — the TOTAL hot-path function, used by every ngettext/npgettext.

Structured sibling of evaluate/2: instead of clamping silently, it reports a malformed rule as data so the consumer can log/alert.

Fallback plural rule used when a .po catalog ships no Plural-Forms: header at all (a degenerate but tolerated input).

Convenience that compiles and evaluates in a single step: given the raw header Header and the count N, returns {ok, Form} or propagates the {error, compile_error()} from compile/1.

Compares the plural expression of header HeaderRule (raw form) against the CLDR canonical rule of Locale, producing only observability — at runtime the header always wins (PSD-004).

AST-based variant of validate_against_cldr/2: takes the ALREADY compiled bundle (plural_compiled()) and compares it against the CLDR rule of Locale without recompiling anything (finding #17).

Types

ast()

-type ast() ::
          integer() |
          n |
          {binop, op(), ast(), ast()} |
          {unop, '!', ast()} |
          {ternary, ast(), ast(), ast()}.

AST of the plural expression — a literal integer, the variable n, a binop ({binop, t:op/0, Left, Right}), the negation unop ({unop, '!', _}) or a ternary ({ternary, Cond, Then, Else}).

It is the tree that compile/1 builds and that evaluate/2/eval_ast/2 interpret. The depth is bounded by ?PLURAL_EXPR_MAX_DEPTH and the node count by ?AST_MAX_NODES, so no valid instance is arbitrarily large.

compile_error()

-type compile_error() ::
          {syntax_error, Reason :: term(), Position :: non_neg_integer()} |
          {missing_nplurals, binary()} |
          {missing_plural_expr, binary()} |
          {nplurals_out_of_range, integer()} |
          {unsafe_plural_rule, plural_eval_error()} |
          {expr_too_long, Size :: non_neg_integer(), Max :: pos_integer()} |
          {expr_too_deep, Depth :: pos_integer(), Position :: non_neg_integer()} |
          {expr_too_complex, Nodes :: pos_integer(), Max :: pos_integer()} |
          {nplurals_too_many_digits, Digits :: pos_integer(), Max :: pos_integer()}.

Structural failure reason from compile/1 — always fail-closed, never an exception.

Groups header defects (missing_nplurals, missing_plural_expr, nplurals_out_of_range, syntax_error) and the anti-DoS hardening rejections: expr_too_long/expr_too_deep/expr_too_complex (byte/depth/node caps), nplurals_too_many_digits (digit cap before the bignum) and unsafe_plural_rule (rule statically faulty for every N). See compile/1 for what triggers each one.

op()

-type op() :: '+' | '-' | '*' | '/' | '%' | '==' | '!=' | '<' | '>' | '<=' | '>=' | '&&' | '||'.

Binary operators accepted in a ast/0, with C precedence/associativity: arithmetic (+ - * / %), relational (< > <= >=), equality (== !=) and short-circuit logical (&& ||). % uses rem (truncates toward zero, like C99); / and % by zero are coerced to 0 on the hot path.

plural_compiled()

-type plural_compiled() :: #{nplurals := pos_integer(), expr := ast(), raw := binary()}.

Compiled plural-rule bundle — the output of compile/1 and the input to evaluate/2/evaluate_checked/2.

  • nplurals — how many plural forms the locale has (validated in [1, ?NPLURALS_MAX]); every returned index stays in [0, nplurals).
  • expr — the parsed ast/0 of the plural= expression, evaluated on the hot path.
  • raw — the originating raw header, preserved for diagnostics and for the divergence payload of validate_against_cldr_ast/2.

Compile once at load and reuse this map on every lookup; there is no result cache inside evaluate/2.

plural_eval_error()

-type plural_eval_error() ::
          {division_by_zero, '/' | '%'} |
          {form_out_of_range, Form :: integer(), NPlurals :: pos_integer()}.

Anomaly observed while evaluating a compiled rule — returned as data, never raised.

{division_by_zero, '/' | '%'} when an evaluated divisor is 0; {form_out_of_range, Form, NPlurals} when the index falls outside [0, NPlurals). It appears as a return of evaluate_checked/2 and as the payload of an {unsafe_plural_rule, _} rejected by compile/1. The total evaluate/2 NEVER produces this — it clamps (parity with libintl).

Functions

cldr_rule(Locale)

-spec cldr_rule(binary()) -> {ok, binary()} | undefined.

Looks up the CLDR canonical plural expression for Locale in the embedded table.

Returns {ok, Expr}, where Expr is the binary of the C plural expression equivalent to that locale's CLDR rule, or undefined if neither the locale nor its base language is in the table. The match is case-sensitive; region tags fall back to the base language when the region itself is not listed (e.g. fr_BE -> fr, since fr_BE has no row of its own in the table).

A lookup/observability function — NOT on the hot path (PSD-004: the .po header is the runtime source-of-truth). The embedded table (cldr_data/0) covers ~49 locales. Both _ and - separators are accepted in the fallback to the base language.

%% Direct hit: the entry exists in the table.
1> erli18n_plural:cldr_rule(<<"ru">>).
{ok,<<"n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : 2">>}
%% Direct hit: `fr_CA` HAS its own row, so it resolves without falling back.
2> erli18n_plural:cldr_rule(<<"fr_CA">>).
{ok,<<"n > 1">>}
%% Fallback to the base language: `fr_BE` is not in the table, falls back to `fr`.
3> erli18n_plural:cldr_rule(<<"fr_BE">>).
{ok,<<"n > 1">>}
%% Neither the locale nor the base exists.
4> erli18n_plural:cldr_rule(<<"xx">>).
undefined

Edge cases: pt_PT (n != 1) diverges from the base pt (n > 1), so the region entry exists separately. See also validate_against_cldr/2 (compare a header against the CLDR rule) and fallback_rule/0.

compile(Header)

-spec compile(binary()) -> {ok, plural_compiled()} | {error, compile_error()}.

Compiles a .po Plural-Forms: header expression into a plural_compiled() bundle (a nplurals/expr/raw map) reused by each evaluate/2.

Header is the header string (nplurals=N; plural=EXPR;); the fields are located in a whitespace-tolerant way. Returns {ok, Compiled} or {error, compile_error()}, always fail-closed (never raises), since it runs over untrusted .po inside the gen_server's handle_call.

Relevant structural rejections:

  • {expr_too_long, Size, Max} — expression above ?PLURAL_EXPR_MAX_BYTES (2048), refused before parsing;
  • {expr_too_deep, Depth, Pos} — nesting above ?PLURAL_EXPR_MAX_DEPTH (64);
  • {expr_too_complex, Nodes, Max} — AST with more nodes than ?AST_MAX_NODES (256), barring wide flat chains (n*n*...*n) that would grow a bignum per lookup;
  • {unsafe_plural_rule, Reason} — STATICALLY faulty rule: division/modulo by a constant divisor of 0, or a constant rule whose form falls outside [0, NPlurals). Cases that fail only for a specific N are left to the dynamic clamp of evaluate/2;
  • {nplurals_too_many_digits, _, _}, {nplurals_out_of_range, _}, {missing_nplurals, _}, {missing_plural_expr, _} and {syntax_error, Reason, Pos} for the remaining header defects.

Edge cases: redundant parentheses and whitespace are absorbed by the parser; n is the ONLY allowed identifier (nx or m become a syntax_error); degenerate rules plural=0 (ja/zh/ko/vi/th) compile as an integer literal (PSD-008). A rule that fails only for a specific N (e.g. n/(n-5)) is NOT rejected here — that is left to the dynamic clamp of evaluate/2.

1> erli18n_plural:compile(<<"nplurals=2; plural=n != 1;">>).
{ok,#{raw => <<"nplurals=2; plural=n != 1;">>,expr => {binop,'!=',n,1},nplurals => 2}}
2> erli18n_plural:compile(<<"nplurals=1; plural=0;">>).
{ok,#{raw => <<"nplurals=1; plural=0;">>,expr => 0,nplurals => 1}}
3> erli18n_plural:compile(<<"nplurals=2; plural=n/0;">>).
{error,{unsafe_plural_rule,{division_by_zero,'/'}}}
4> erli18n_plural:compile(<<"nplurals=2; plural=nx;">>).
{error,{syntax_error,{unknown_identifier_after_n,$x},1}}
5> erli18n_plural:compile(<<"nplurals=2;">>).
{error,{missing_plural_expr,<<"nplurals=2;">>}}

See also evaluate/2 (consume the bundle), plural_by_po_header/2 (compile+evaluate at once) and compile_error/0.

evaluate/2

-spec evaluate(plural_compiled(), integer()) -> non_neg_integer().

Evaluates a compiled plural rule for a given N and returns the plural form index — the TOTAL hot-path function, used by every ngettext/npgettext.

Compiled is the bundle from compile/1; N is the count (an integer, may be negative — the rule decides the semantics). The return is always a non_neg_integer() in [0, NPlurals): the rule is interpreted and the result coerced to an integer.

Never raises, even on a malformed rule (parity with GNU libintl): division/modulo by zero is coerced to 0 (eval_div/2/eval_rem/2 instead of letting div/rem raise badarith) and a form outside [0, NPlurals) is clamped to 0 (if index >= nplurals -> index = 0).

No allocations beyond the return value and no result cache: the cost is re-paying the AST interpretation on every call — which is why the compile/1 caps keep the AST small. A negative N is passed through without abs(); the rule decides the semantics (and the clamp protects the result).

1> {ok, C} = erli18n_plural:compile(<<"nplurals=2; plural=n != 1;">>).
2> erli18n_plural:evaluate(C, 1).
0
3> erli18n_plural:evaluate(C, 5).
1
%% Divisor DEPENDS on n (passes compile/1's static check),
%% but evaluates to zero at runtime for N=7: clamp to 0, no crash.
4> {ok, Bad} = erli18n_plural:compile(<<"nplurals=2; plural=1/(n-7);">>).
5> erli18n_plural:evaluate(Bad, 7).
0

Edge cases: the short-circuit of &&/|| is honoured, so a zero divisor behind a false branch is never reached. To OBSERVE the anomaly (instead of silent clamping) use evaluate_checked/2. See also compile/1 and plural_by_po_header/2.

evaluate_checked/2

-spec evaluate_checked(plural_compiled(), integer()) ->
                          {ok, non_neg_integer()} | {error, plural_eval_error()}.

Structured sibling of evaluate/2: instead of clamping silently, it reports a malformed rule as data so the consumer can log/alert.

Compiled and N are as in evaluate/2. Returns {ok, Form} with the form in [0, NPlurals), or {error, plural_eval_error()}: {division_by_zero, '/' | '%'} when the evaluated divisor is 0, or {form_out_of_range, Form, NPlurals} when the form leaves the range. It keeps the short-circuit of &&/|| (a zero divisor behind a false branch is not reported) and, like evaluate/2, is total — never raises.

Use this off the hot path, when you want to log/alert the malformed rule; on the hot path stay with evaluate/2, whose clamp is cheaper. Where evaluate/2 would return 0 by clamping, this function returns the corresponding {error, _}.

1> {ok, C} = erli18n_plural:compile(<<"nplurals=2; plural=n != 1;">>).
2> erli18n_plural:evaluate_checked(C, 5).
{ok,1}
%% Same rule as the evaluate/2 example (divisor depends on n).
%% Where evaluate/2 would clamp to 0, here the anomaly comes back as data.
3> {ok, Bad} = erli18n_plural:compile(<<"nplurals=2; plural=1/(n-7);">>).
4> erli18n_plural:evaluate_checked(Bad, 7).
{error,{division_by_zero,'/'}}
5> erli18n_plural:evaluate_checked(Bad, 8).
{ok,1}

Edge cases: a form outside [0, NPlurals) (where evaluate/2 would clamp) becomes {error, {form_out_of_range, Form, NPlurals}}. See also evaluate/2 (the sibling that clamps) and plural_eval_error/0.

fallback_rule()

-spec fallback_rule() -> binary().

Fallback plural rule used when a .po catalog ships no Plural-Forms: header at all (a degenerate but tolerated input).

Returns <<"nplurals=2; plural=n != 1;">> — the Germanic C/English default cited by the GNU gettext manual (§"Plural forms").

A pure constant, no side effects. The result is a raw header ready for compile/1, so the loader's fallback path reuses exactly the same pipeline as a legitimate header.

1> erli18n_plural:fallback_rule().
<<"nplurals=2; plural=n != 1;">>
2> {ok, C} = erli18n_plural:compile(erli18n_plural:fallback_rule()),
2> erli18n_plural:evaluate(C, 1).
0

See also compile/1 and cldr_rule/1.

plural_by_po_header(Header, N)

-spec plural_by_po_header(binary(), integer()) -> {ok, non_neg_integer()} | {error, compile_error()}.

Convenience that compiles and evaluates in a single step: given the raw header Header and the count N, returns {ok, Form} or propagates the {error, compile_error()} from compile/1.

Recompiles on every call, so it is for one-off use; on the hot path, call compile/1 once at load and reuse the bundle with evaluate/2.

The internal evaluation uses evaluate/2 (total), so an {ok, _} never embeds an evaluation anomaly — the only part that can fail is compile/1, whose error is propagated as-is.

1> erli18n_plural:plural_by_po_header(<<"nplurals=2; plural=n != 1;">>, 1).
{ok,0}
2> erli18n_plural:plural_by_po_header(<<"nplurals=2; plural=n != 1;">>, 3).
{ok,1}
3> erli18n_plural:plural_by_po_header(<<"nplurals=2; plural=nx;">>, 1).
{error,{syntax_error,{unknown_identifier_after_n,$x},1}}

See also compile/1 and evaluate/2.

validate_against_cldr(Locale, HeaderRule)

-spec validate_against_cldr(binary(), binary()) ->
                               ok | {warning, {plural_divergence, binary(), binary(), binary()}}.

Compares the plural expression of header HeaderRule (raw form) against the CLDR canonical rule of Locale, producing only observability — at runtime the header always wins (PSD-004).

Compiles HeaderRule ONCE and delegates to validate_against_cldr_ast/2. Returns ok when the (nplurals, expr) ASTs are structurally equal (whitespace/paren-insensitive) or when the locale has no CLDR entry; returns {warning, {plural_divergence, Locale, HeaderRule, CldrRaw}} when they diverge — including when the header is invalid but the locale is listed in CLDR.

A convenience entry point for callers that only have the raw header. The catalog loader, which already keeps the compiled bundle, should use validate_against_cldr_ast/2 to avoid recompiling the header at load.

The comparison is STRUCTURAL over the (nplurals, expr-AST) pair, so it is insensitive to whitespace and redundant parentheses: (n != 1) matches n != 1. Nothing changes at runtime — the warning exists only for telemetry.

%% Header agrees with fr's CLDR (n > 1): no warning.
1> erli18n_plural:validate_against_cldr(<<"fr">>, <<"nplurals=2; plural=(n > 1);">>).
ok
%% Header diverges from fr's CLDR: warning (but the header would win at runtime).
2> erli18n_plural:validate_against_cldr(<<"fr">>, <<"nplurals=2; plural=n != 1;">>).
{warning,{plural_divergence,<<"fr">>,<<"nplurals=2; plural=n != 1;">>,<<"n > 1">>}}
%% Locale with no CLDR entry: nothing to validate.
3> erli18n_plural:validate_against_cldr(<<"xx">>, <<"nplurals=2; plural=n != 1;">>).
ok

Edge cases: an INVALID header against a locale that IS listed in CLDR still produces {warning, _} (it cannot match the canonical rule); against a locale with no CLDR entry it becomes ok. See also validate_against_cldr_ast/2 (variant without recompiling) and cldr_rule/1.

validate_against_cldr_ast/2

-spec validate_against_cldr_ast(binary(), plural_compiled()) ->
                                   ok | {warning, {plural_divergence, binary(), binary(), binary()}}.

AST-based variant of validate_against_cldr/2: takes the ALREADY compiled bundle (plural_compiled()) and compares it against the CLDR rule of Locale without recompiling anything (finding #17).

Reuses the header AST as-is and takes the CLDR side from a memoised table of compiled bundles, so no rule is re-parsed at load. Returns ok if the (nplurals, expr) pairs match or if the locale has no CLDR entry; otherwise {warning, {plural_divergence, Locale, HeaderRaw, CldrRaw}}, with the raw header (the bundle's raw field) and the raw CLDR expression.

This is the PREFERRED form in the loader (finding #17): since the bundle was already compiled by compile/1 at load, it avoids the second compile/1 that validate_against_cldr/2 would do, and the CLDR side comes from the persistent_term cache (cldr_compiled_table/0), not re-synthesised per load.

1> {ok, C} = erli18n_plural:compile(<<"nplurals=2; plural=n != 1;">>).
2> erli18n_plural:validate_against_cldr_ast(<<"fr">>, C).
{warning,{plural_divergence,<<"fr">>,<<"nplurals=2; plural=n != 1;">>,<<"n > 1">>}}
3> {ok, Cde} = erli18n_plural:compile(<<"nplurals=2; plural=n != 1;">>).
4> erli18n_plural:validate_against_cldr_ast(<<"de">>, Cde).
ok

Edge cases: a locale with no CLDR entry becomes ok (nothing to log). See also validate_against_cldr/2 (from the raw header) and cldr_compiled/1 (the memoisation of the CLDR side).