Cldr.Collation.Reorder (Cldr Collation v1.1.0)

Copy Markdown View Source

Script reordering for collation (kr= / reorder option).

Remaps primary weights to change the relative order of scripts. For example, reorder: [:Grek, :Latn] would sort Greek characters before Latin characters.

Script boundaries are determined from the fractional lead bytes in FractionalUCA.txt, which cleanly partition scripts. Since the CLDR allkeys integer primary weights interleave scripts within lead bytes, a per-primary-weight lookup is used to identify each weight's script before applying the reorder permutation.

Summary

Functions

Apply a reorder mapping to a primary weight.

Build a reorder mapping function from the given script codes.

Load a mapping from allkeys integer primary weights to their fractional lead bytes and sub-bytes.

Load the script-to-lead-byte-range mapping from FractionalUCA.txt.

Functions

apply_mapping(mapping_fn, primary)

@spec apply_mapping((non_neg_integer() -> non_neg_integer()) | nil, non_neg_integer()) ::
  non_neg_integer()

Apply a reorder mapping to a primary weight.

Arguments

  • mapping_fn - a reorder mapping function from build_mapping/1, or nil.

  • primary - the primary weight to remap.

Returns

The remapped primary weight, or the original if mapping_fn is nil.

Examples

iex> Cldr.Collation.Reorder.apply_mapping(nil, 0x2A00)
0x2A00

iex> mapping = Cldr.Collation.Reorder.build_mapping([:Grek, :Latn])
iex> remapped = Cldr.Collation.Reorder.apply_mapping(mapping, 0x2A00)
iex> is_integer(remapped)
true

build_mapping(reorder_codes)

@spec build_mapping([atom()]) :: (non_neg_integer() -> non_neg_integer()) | nil

Build a reorder mapping function from the given script codes.

Creates a function that remaps primary weights to reorder scripts. Core codes (space, punct, symbol, currency, digit) that are not explicitly listed are prepended automatically.

Arguments

  • reorder_codes - a list of script code atoms (e.g., [:Grek, :Latn]). Supports ISO 15924 codes (:Latn, :Grek, :Cyrl) and special codes (:space, :punct, :symbol, :currency, :digit, :others).

Returns

  • A function (primary :: integer()) -> integer() that remaps primary weights.
  • nil if the list is empty or no valid mappings were found.

Examples

iex> Cldr.Collation.Reorder.build_mapping([])
nil

iex> mapping = Cldr.Collation.Reorder.build_mapping([:Grek, :Latn])
iex> is_function(mapping, 1)
true

load_primary_to_fractional_lead()

@spec load_primary_to_fractional_lead() :: %{
  required(integer() | {:sub, integer()}) => non_neg_integer()
}

Load a mapping from allkeys integer primary weights to their fractional lead bytes and sub-bytes.

Parses FractionalUCA.txt data lines to extract both the fractional CE (which gives the lead byte and sub-byte) and the allkeys integer primary weight (from the comment portion).

The returned map has two types of entries:

  • primary_weight => fractional_lead_byte - the script-identifying lead byte.
  • {:sub, primary_weight} => fractional_sub_byte - the within-script sub-byte for preserving relative ordering during remapping.

Returns

A map %{integer() | {:sub, integer()} => non_neg_integer()}.

load_script_ranges()

@spec load_script_ranges() :: %{
  required(String.t()) => {non_neg_integer(), non_neg_integer()}
}

Load the script-to-lead-byte-range mapping from FractionalUCA.txt.

Parses [top_byte ...] entries from the data file. Falls back to hardcoded defaults if the file is not found.

Returns

A map %{String.t() => {start_byte, end_byte}} where keys are lowercase script/group names and values are fractional lead byte range tuples.

Examples

iex> ranges = Cldr.Collation.Reorder.load_script_ranges()
iex> is_map(ranges)
true