View Source Unicode.Unihan (Unicode v0.3.0)
Functions to introspect the Unicode Unihan character database.
Link to this section Summary
Functions
Filter the Unihan database returning selected codepoints.
Load the unihan data into :persistent_term.
Filter the Unihan database returning selected codepoints that are not rejected by the provided function.
Takes an integer codepoint, a Unihan codepoint map, or list of maps and returns the grapheme (or list of graphemes) of the codepoint.
Returns the Unihan database metadata for a given codepoint.
Returns the field information for the data in the Unihan database.
Link to this section Functions
Filter the Unihan database returning selected codepoints.
arguments
Arguments
fun
is a1-arity
function that is passed the attribute map for a given codepoint. if the function returns atruthy
value then the codepoint is included in the returned data. If the return value isfalsy
then the codepoint is omitted from the returned list.
returns
Returns
- a map of the filtered codepoints mapped to their attributes.
example
Example
iex> Unicode.Unihan.filter(&(&1.kTotalStrokes[:"Hans"] > 30))
...> |> Enum.count()
238
iex> Unicode.Unihan.filter(&(&1.kTotalStrokes[:"Hans"] != &1.kTotalStrokes[:"Hant"]))
...> |> Enum.count
3
iex> Unicode.Unihan.filter(&(&1[:kGradeLevel] <= 6))
...> |> Enum.count
2632
Load the unihan data into :persistent_term.
This function will be called on the first access
by Unicode.Unihan.unihan/1
but can be called
on application load if required.
First the existence of an erlang term format file of the unihan database is found. If so, it is loaded. If not (the first time the function is called), the file is generated and then loaded.
Filter the Unihan database returning selected codepoints that are not rejected by the provided function.
arguments
Arguments
fun
is a1-arity
function that is passed the attribute map for a given codepoint. if the function returns afalsy
value then the codepoint is included in the returned data. If the return value istruthy
then the codepoint is omitted from the returned list.
returns
Returns
- a map of the codepoints that are not rejected mapped to their attributes.
example
Example
iex> Unicode.Unihan.reject(&(&1.kTotalStrokes[:"Hans"] > 30))
...> |> Enum.count()
97822
Takes an integer codepoint, a Unihan codepoint map, or list of maps and returns the grapheme (or list of graphemes) of the codepoint.
examples
Examples
iex> Unicode.Unihan.to_string(25342)
"拾"
iex> Unicode.Unihan.unihan("拾")
...> |> Unicode.Unihan.to_string()
"拾"
Returns the Unihan database metadata for a given codepoint.
The codepoint can be expressed as an integer or a grapheme.
examples
Examples
iex> Unicode.Unihan.unihan(171339)
%{
codepoint: 171339,
kCantonese: %{coda: "", final: "u", jyutping: "ju4", nucleus: "u", onset: "j", tone: "4"},
kDefinition: ["(J) nonstandard variant of 魚 U+9B5A, fish"],
kHanYu: %{page: 4674, position: 9, virtual: false, volume: 7},
kIRGHanyuDaZidian: %{page: 4674, position: 9, virtual: false, volume: 7},
kIRGKangXi: %{page: 1465, position: 1, virtual: true},
kIRG_GSource: %{mapping: ["74674.09"], source: "GHZ"},
kIRG_TSource: %{mapping: "3043", source: "T4"},
kIRG_VSource: %{mapping: "29D4B", source: "VN"},
kJapaneseKun: ["UO", "SAKANA", "SUNADORU"],
kJapaneseOn: "GYO",
kKangXi: %{page: 1465, position: 1, virtual: true},
kNelson: 692,
kPhonetic: %{class: 1605},
kRSAdobe_Japan1_6: [
%{cid: 13717, code: "C", kangxi: 195, strokes_radical: 10, strokes_residue: 0},
%{cid: 13718, code: "V", kangxi: 195, strokes_radical: 10, strokes_residue: 0}
],
kRSKangXi: %{radical: 195, strokes: 0},
kRSUnicode: %{radical: 195, simplified_radical: false, strokes: 0},
kTotalStrokes: %{Hans: 11, Hant: 11}
}
iex> Unicode.Unihan.unihan("㝰")
%{
codepoint: 14192,
kCangjie: ["J", "H", "U", "S"],
kCantonese: %{coda: "n", final: "in", jyutping: "min4", nucleus: "i", onset: "m", tone: "4"},
kDefinition: ["unable to meet, empty room"],
kHanYu: %{page: 957, position: 3, virtual: false, volume: 2},
kHanyuPinyin: %{location: [%{page: 20957, position: 3, virtual: false}], readings: ["mián"]},
kIRGHanyuDaZidian: %{page: 957, position: 3, virtual: false, volume: 2},
kIRGKangXi: %{page: 293, position: 1, virtual: false},
kIRG_GSource: %{mapping: ["3E3C"], source: "G5"},
kIRG_KSource: %{mapping: "236A", source: "K3"},
kIRG_TSource: %{mapping: "5A7D", source: "T4"},
kKangXi: %{page: 293, position: 1, virtual: false},
kMandarin: "mián",
kRSUnicode: %{radical: 40, simplified_radical: false, strokes: 15},
kSBGY: %{page: 135, position: 35},
kTotalStrokes: %{Hans: 18, Hant: 18}
}
Returns the field information for the data in the Unihan database.