erli18n_interp (erli18n v0.2.0)

Copy Markdown View Source

Pure, total, fail-soft substituter for named %{name} placeholders.

This is the Phase 1 interpolation engine that backs the f-suffix family on erli18n (gettextf, ngettextf, pgettextf, npgettextf and their d/dc aliases). It takes a resolved translation msgstr plus a map of Bindings and produces the final binary with each %{name} replaced by its bound value.

The problem it solves

A translated string frequently needs runtime values spliced in (<<"Hello, %{name}!">>). gettext itself has no interpolation; consumers usually hand-roll io_lib:format/2 with positional ~s, which couples the translation to argument ORDER and breaks the moment a translator reorders words. Named placeholders (%{name}) decouple the wording from the call site: the translator can move %{name} anywhere in the sentence and the binding still resolves by name.

Mental model — totality on the hot path

format/2 runs on EVERY gettextf/ngettextf lookup, so it carries the SAME totality bar as erli18n_plural:evaluate/2: it is TOTAL and fail-soft — for ANY msgstr bytes and ANY Bindings map it NEVER raises and ALWAYS returns a binary. There is exactly one opt-in path allowed to raise: format/3 with #{on_missing => strict}, used when a caller wants a missing binding to be a hard error rather than a silently-retained literal.

The substitution is a single left-to-right pass over the input:

  • "%%" collapses to a literal "%" (both bytes consumed).
  • "%{<name>}", where <name> matches [A-Za-z_][A-Za-z0-9_]*, is replaced by the bound value, or handled per the on_missing policy if the name is unbound.
  • To emit a literal "%{name}" un-substituted, author "%%{name}": the "%%" collapses to "%", leaving the following "{name}" untouched.
  • A lone "%" that begins neither "%%" nor a valid "%{name}", and a "%{" that never closes into a valid placeholder, are emitted literally. Nothing crashes.

Binding values and atom safety

Binding keys are atoms (#{name => <<"World">>}). Values may be a binary, an iolist/string, an integer, a float, or an atom; every value is coerced to UTF-8 text TOTALLY — an unknown or malformed term renders via a bounded safe fallback rather than raising.

A placeholder name is resolved with binary_to_existing_atom/2 wrapped in try: a name that is not an already-existing atom is treated as a MISSING binding and NEVER creates a new atom. This closes the atom-table-exhaustion DoS that binary_to_atom/2 would open on untrusted msgstr.

Anti-DoS

Consistent with the project's plural caps (see erli18n_plural and include/erli18n.hrl), the work is bounded fail-closed. Because format/2 must stay total, the lenient path CLAMPS rather than raises:

  • ?MAX_OUTPUT_BYTES (65536) — the accumulated output is truncated once it would exceed this size; the remaining input is dropped.
  • ?MAX_EXPANSIONS (1024) — once this many placeholders have been expanded, further %{name} references are emitted literally instead of substituted.
  • ?MAX_NAME_BYTES (256) — a %{ whose name run exceeds this many bytes before the closing } is treated as a malformed reference and emitted literally (this also bounds the binary_to_existing_atom/2 probe).

Bidi (RTL) hazard

v1 does NOT auto-insert Unicode bidi isolation marks (U+2066..U+2069) around interpolated values. Splicing an RTL value (Arabic/Hebrew) into an LTR sentence — or vice versa — can therefore reorder neighbouring punctuation under the Unicode Bidirectional Algorithm. Callers that mix directions should isolate values themselves until a future version offers opt-in isolation.

Quickstart

1> erli18n_interp:format(<<"Hello, %{name}!">>, #{name => <<"World">>}).
<<"Hello, World!">>
2> erli18n_interp:format(<<"%{a} then %{b}">>, #{a => 1, b => two}).
<<"1 then two">>
3> erli18n_interp:format(<<"100%% sure about %{x}">>, #{}).
<<"100% sure about %{x}">>
4> erli18n_interp:format(<<"need %{x}">>, #{}, #{on_missing => strict}).
** exception error: {erli18n_interp,{missing_binding,x}}

Summary

Types

Map of placeholder bindings: atom keys to coercible values.

Policy for a %{name} whose name resolves to no binding.

Options for format/3. Currently a single key, on_missing, defaulting to lenient (which makes format/3 equal to format/2).

Functions

Interpolate %{name} placeholders in Msgstr using Bindings, leniently.

Interpolate %{name} placeholders in Msgstr using Bindings, with Opts controlling the missing-binding policy.

Types

bindings()

-type bindings() :: #{atom() => term()}.

Map of placeholder bindings: atom keys to coercible values.

A key is the atom form of a %{name} placeholder. A value is coerced to UTF-8 text totally and may be a binary, an iolist/string, an integer, a float, or an atom. Any other term renders via a bounded safe fallback instead of raising.

on_missing()

-type on_missing() :: lenient | strict.

Policy for a %{name} whose name resolves to no binding.

  • lenient (default): the placeholder is emitted literally, unchanged (%{name} stays %{name}), and the pass continues. format/2 always uses this policy.
  • strict: the pass raises error({erli18n_interp, {missing_binding, Name}}). This is the ONLY path in the module allowed to raise and is opt-in via format/3.

opts()

-type opts() :: #{on_missing => on_missing()}.

Options for format/3. Currently a single key, on_missing, defaulting to lenient (which makes format/3 equal to format/2).

Functions

format(Msgstr, Bindings)

-spec format(binary(), bindings()) -> binary().

Interpolate %{name} placeholders in Msgstr using Bindings, leniently.

TOTAL and fail-soft: for ANY Msgstr bytes and ANY Bindings map this never raises and always returns a binary. A missing binding leaves its %{name} literal in place. Equivalent to format(Msgstr, Bindings, #{on_missing => lenient}).

See the module doc for the substitution grammar, value coercion, and the anti-DoS caps.

Examples

1> erli18n_interp:format(<<"Hi %{who}">>, #{who => <<"Sam">>}).
<<"Hi Sam">>
2> erli18n_interp:format(<<"Hi %{who}">>, #{}).
<<"Hi %{who}">>
3> erli18n_interp:format(<<"50%% off">>, #{}).
<<"50% off">>

format(Msgstr, Bindings, Opts)

-spec format(binary(), bindings(), opts()) -> binary().

Interpolate %{name} placeholders in Msgstr using Bindings, with Opts controlling the missing-binding policy.

Opts supports #{on_missing => lenient | strict}. With lenient (the default) this is TOTAL and equals format/2. With strict, a %{name} whose name has no binding raises error({erli18n_interp, {missing_binding, Name}}) — the only raising path in this module.

Name in the error is the atom form of the placeholder when it already exists as an atom, otherwise the raw name binary (a non-existing atom name is never interned).

Examples

1> erli18n_interp:format(<<"Hi %{who}">>, #{who => <<"Sam">>},
1>                       #{on_missing => strict}).
<<"Hi Sam">>
2> erli18n_interp:format(<<"Hi %{who}">>, #{},
2>                       #{on_missing => strict}).
** exception error: {erli18n_interp,{missing_binding,who}}