Cldr.Collation.ImplicitWeights (Cldr Collation v1.1.0)

Copy Markdown View Source

Computes implicit collation elements for codepoints not in the DUCET/CLDR allkeys table.

The UCA defines an algorithm for computing implicit weights for:

  • CJK Unified Ideographs (Han characters)
  • Hangul syllables (decomposed algorithmically)
  • Unassigned codepoints

See UTS #10 Section 10.1 for the implicit weight computation.

Summary

Functions

Compute implicit collation elements for a codepoint not in the allkeys table.

Decompose a Hangul syllable into its constituent jamo codepoints.

Check if a codepoint is a Hangul syllable.

Check if a codepoint is a CJK Unified Ideograph.

Functions

compute(cp)

@spec compute(non_neg_integer()) ::
  {:hangul_decompose, [non_neg_integer()]} | [Cldr.Collation.Element.t()]

Compute implicit collation elements for a codepoint not in the allkeys table.

Handles three cases:

  • Hangul syllables: algorithmically decomposed to jamo
  • CJK Unified Ideographs: implicit weight pair from code point value
  • All others: unassigned implicit weight pair

Arguments

  • cp - an integer codepoint.

Returns

  • {:hangul_decompose, jamo} - for Hangul syllables, returns the constituent jamo for table lookup.
  • [%Cldr.Collation.Element{}, %Cldr.Collation.Element{}] - two implicit CEs for CJK or unassigned codepoints.

Examples

iex> [ce1, ce2] = Cldr.Collation.ImplicitWeights.compute(0x4E00)
iex> Cldr.Collation.Element.primary(ce1) >= 0xFB40
true
iex> Cldr.Collation.Element.secondary(ce2)
0

decompose_hangul_to_jamo(cp)

@spec decompose_hangul_to_jamo(non_neg_integer()) :: [non_neg_integer()]

Decompose a Hangul syllable into its constituent jamo codepoints.

Uses the algorithmic decomposition defined in the Unicode Standard (Chapter 3, Section 3.12).

Arguments

  • cp - an integer codepoint for a Hangul syllable (U+AC00..U+D7A3).

Returns

A list of 2 or 3 jamo codepoints: [lead, vowel] or [lead, vowel, trail].

Examples

iex> Cldr.Collation.ImplicitWeights.decompose_hangul_to_jamo(0xAC00)
[0x1100, 0x1161]

iex> Cldr.Collation.ImplicitWeights.decompose_hangul_to_jamo(0xAC01)
[0x1100, 0x1161, 0x11A8]

hangul_syllable?(cp)

@spec hangul_syllable?(non_neg_integer()) :: boolean()

Check if a codepoint is a Hangul syllable.

Hangul syllables occupy the range U+AC00..U+D7A3.

Arguments

  • cp - an integer codepoint.

Returns

  • true if the codepoint is a Hangul syllable.
  • false otherwise.

Examples

iex> Cldr.Collation.ImplicitWeights.hangul_syllable?(0xAC00)
true

iex> Cldr.Collation.ImplicitWeights.hangul_syllable?(0x0041)
false

unified_ideograph?(cp)

@spec unified_ideograph?(non_neg_integer()) :: boolean()

Check if a codepoint is a CJK Unified Ideograph.

Covers the core CJK block, all extensions (A through H), compatibility ideographs, and specific compatibility ideograph codepoints.

Arguments

  • cp - an integer codepoint.

Returns

  • true if the codepoint is a CJK Unified Ideograph.
  • false otherwise.

Examples

iex> Cldr.Collation.ImplicitWeights.unified_ideograph?(0x4E00)
true

iex> Cldr.Collation.ImplicitWeights.unified_ideograph?(0x0041)
false