Cldr.Collation.Table (Cldr Collation v1.1.0)

Copy Markdown View Source

Persistent-term-backed collation element table.

Stores the collation table parsed from FractionalUCA.txt for fast concurrent lookups using :persistent_term, which provides zero-copy reads for data that is written once and never modified.

Handles both single codepoint mappings and contractions (multi-codepoint sequences).

Summary

Functions

Returns a specification to start this module under a supervisor.

Check if a codepoint begins any multi-codepoint contraction.

Ensure the collation table is loaded.

Find the longest matching entry for the given codepoint sequence.

Find the longest matching entry, checking a tailoring overlay first.

Look up collation elements for a codepoint or codepoint sequence.

Look up collation elements with a tailoring overlay checked first.

Functions

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

contraction_starters(codepoint)

@spec contraction_starters(non_neg_integer()) :: [pos_integer()]

Check if a codepoint begins any multi-codepoint contraction.

Arguments

  • codepoint - an integer codepoint to check.

Returns

  • A list of contraction lengths that start with this codepoint, or [] if. this codepoint does not begin any contractions.

Examples

iex> Cldr.Collation.Table.ensure_loaded()
iex> lengths = Cldr.Collation.Table.contraction_starters(0x006C)
iex> is_list(lengths)
true

ensure_loaded()

@spec ensure_loaded() :: :ok

Ensure the collation table is loaded.

Loads the FractionalUCA.txt data file on first call. Subsequent calls are no-ops.

Returns

  • :ok - the table is loaded and ready for lookups.

Examples

iex> Cldr.Collation.Table.ensure_loaded()
:ok

longest_match(codepoints)

@spec longest_match([non_neg_integer()]) ::
  {[non_neg_integer()], [Cldr.Collation.Element.t()], [non_neg_integer()]}
  | {:unmapped, non_neg_integer(), [non_neg_integer()]}
  | :done

Find the longest matching entry for the given codepoint sequence.

Tries contractions from longest to shortest, falling back to a single codepoint lookup.

Arguments

  • codepoints - a list of integer codepoints to match against.

Returns

  • {matched_cps, elements, remaining_cps} - a successful match with the. matched codepoints, their collation elements, and the remaining unprocessed tail.

  • {:unmapped, codepoint, remaining_cps} - the first codepoint has no table entry.

  • :done - the input list is empty.

Examples

iex> Cldr.Collation.Table.ensure_loaded()
iex> {matched, _elements, rest} = Cldr.Collation.Table.longest_match([0x0041, 0x0042])
iex> matched
[65]
iex> rest
[66]

longest_match_with_overlay(codepoints, overlay)

@spec longest_match_with_overlay([non_neg_integer()], map() | nil) ::
  {[non_neg_integer()], [Cldr.Collation.Element.t()], [non_neg_integer()]}
  | {:unmapped, non_neg_integer(), [non_neg_integer()]}
  | :done

Find the longest matching entry, checking a tailoring overlay first.

Arguments

  • codepoints - a list of integer codepoints to match.

  • overlay - a tailoring overlay map, or nil for root-only lookups.

Returns

Same as longest_match/1.

Examples

iex> Cldr.Collation.Table.ensure_loaded()
iex> {matched, _elems, rest} = Cldr.Collation.Table.longest_match_with_overlay([0x0041, 0x0042], nil)
iex> matched
[65]
iex> rest
[66]

lookup(codepoint)

@spec lookup(non_neg_integer() | [non_neg_integer()]) ::
  {:ok, [Cldr.Collation.Element.t()]} | :unmapped

Look up collation elements for a codepoint or codepoint sequence.

Arguments

  • codepoint - a single integer codepoint, or a list of integer. codepoints (contraction). Lists are internally converted to the compact key format (integer for single, tuple for multi).

Returns

  • {:ok, [%Cldr.Collation.Element{}]} - the collation elements for. the entry.

  • :unmapped - no entry found in the table.

Examples

iex> Cldr.Collation.Table.ensure_loaded()
iex> {:ok, elements} = Cldr.Collation.Table.lookup(0x0041)
iex> Cldr.Collation.Element.primary(hd(elements)) > 0
true

iex> Cldr.Collation.Table.ensure_loaded()
iex> Cldr.Collation.Table.lookup(0x10FFFF)
:unmapped

lookup_with_overlay(codepoint, overlay)

@spec lookup_with_overlay(non_neg_integer() | [non_neg_integer()], map() | nil) ::
  {:ok, [Cldr.Collation.Element.t()]} | :unmapped

Look up collation elements with a tailoring overlay checked first.

Arguments

  • codepoints - a single integer codepoint, or a list of integer codepoints.

  • overlay - a map of %{key => [%Cldr.Collation.Element{}]} tailoring. entries, where keys are integers (single CP) or tuples (contractions).

Returns

  • Same as lookup/1, but checks the overlay map before falling back. to the root table.

Examples

iex> Cldr.Collation.Table.ensure_loaded()
iex> overlay = %{0x0041 => [{0xFFFF, 0x0020, 0x0008, false}]}
iex> {:ok, [elem]} = Cldr.Collation.Table.lookup_with_overlay(0x0041, overlay)
iex> Cldr.Collation.Element.primary(elem)
0xFFFF

start_link(options \\ [])