Computes implicit collation elements for codepoints not in the DUCET/CLDR allkeys table.
The UCA defines an algorithm for computing implicit weights for:
- CJK Unified Ideographs (Han characters)
- Hangul syllables (decomposed algorithmically)
- Unassigned codepoints
See UTS #10 Section 10.1 for the implicit weight computation.
Summary
Functions
Compute implicit collation elements for a codepoint not in the allkeys table.
Decompose a Hangul syllable into its constituent jamo codepoints.
Check if a codepoint is a Hangul syllable.
Check if a codepoint is a CJK Unified Ideograph.
Functions
@spec compute(non_neg_integer()) :: {:hangul_decompose, [non_neg_integer()]} | [Cldr.Collation.Element.t()]
Compute implicit collation elements for a codepoint not in the allkeys table.
Handles three cases:
- Hangul syllables: algorithmically decomposed to jamo
- CJK Unified Ideographs: implicit weight pair from code point value
- All others: unassigned implicit weight pair
Arguments
cp- an integer codepoint.
Returns
{:hangul_decompose, jamo}- for Hangul syllables, returns the constituent jamo for table lookup.[%Cldr.Collation.Element{}, %Cldr.Collation.Element{}]- two implicit CEs for CJK or unassigned codepoints.
Examples
iex> [ce1, ce2] = Cldr.Collation.ImplicitWeights.compute(0x4E00)
iex> Cldr.Collation.Element.primary(ce1) >= 0xFB40
true
iex> Cldr.Collation.Element.secondary(ce2)
0
@spec decompose_hangul_to_jamo(non_neg_integer()) :: [non_neg_integer()]
Decompose a Hangul syllable into its constituent jamo codepoints.
Uses the algorithmic decomposition defined in the Unicode Standard (Chapter 3, Section 3.12).
Arguments
cp- an integer codepoint for a Hangul syllable (U+AC00..U+D7A3).
Returns
A list of 2 or 3 jamo codepoints: [lead, vowel] or [lead, vowel, trail].
Examples
iex> Cldr.Collation.ImplicitWeights.decompose_hangul_to_jamo(0xAC00)
[0x1100, 0x1161]
iex> Cldr.Collation.ImplicitWeights.decompose_hangul_to_jamo(0xAC01)
[0x1100, 0x1161, 0x11A8]
@spec hangul_syllable?(non_neg_integer()) :: boolean()
Check if a codepoint is a Hangul syllable.
Hangul syllables occupy the range U+AC00..U+D7A3.
Arguments
cp- an integer codepoint.
Returns
trueif the codepoint is a Hangul syllable.falseotherwise.
Examples
iex> Cldr.Collation.ImplicitWeights.hangul_syllable?(0xAC00)
true
iex> Cldr.Collation.ImplicitWeights.hangul_syllable?(0x0041)
false
@spec unified_ideograph?(non_neg_integer()) :: boolean()
Check if a codepoint is a CJK Unified Ideograph.
Covers the core CJK block, all extensions (A through H), compatibility ideographs, and specific compatibility ideograph codepoints.
Arguments
cp- an integer codepoint.
Returns
trueif the codepoint is a CJK Unified Ideograph.falseotherwise.
Examples
iex> Cldr.Collation.ImplicitWeights.unified_ideograph?(0x4E00)
true
iex> Cldr.Collation.ImplicitWeights.unified_ideograph?(0x0041)
false