View Source Unicode.Unihan (Unicode v0.3.0)

Functions to introspect the Unicode Unihan character database.

Link to this section Summary

Functions

Filter the Unihan database returning selected codepoints.

Load the unihan data into :persistent_term.

Filter the Unihan database returning selected codepoints that are not rejected by the provided function.

Takes an integer codepoint, a Unihan codepoint map, or list of maps and returns the grapheme (or list of graphemes) of the codepoint.

Returns the Unihan database metadata for a given codepoint.

Returns the field information for the data in the Unihan database.

Link to this section Functions

Filter the Unihan database returning selected codepoints.

arguments

Arguments

  • fun is a 1-arity function that is passed the attribute map for a given codepoint. if the function returns a truthy value then the codepoint is included in the returned data. If the return value is falsy then the codepoint is omitted from the returned list.

returns

Returns

  • a map of the filtered codepoints mapped to their attributes.

example

Example

iex> Unicode.Unihan.filter(&(&1.kTotalStrokes[:"Hans"] > 30))
...> |> Enum.count()
238

iex> Unicode.Unihan.filter(&(&1.kTotalStrokes[:"Hans"] != &1.kTotalStrokes[:"Hant"]))
...> |> Enum.count
3

iex> Unicode.Unihan.filter(&(&1[:kGradeLevel] <= 6))
...> |> Enum.count
2632

Load the unihan data into :persistent_term.

This function will be called on the first access by Unicode.Unihan.unihan/1 but can be called on application load if required.

First the existence of an erlang term format file of the unihan database is found. If so, it is loaded. If not (the first time the function is called), the file is generated and then loaded.

Filter the Unihan database returning selected codepoints that are not rejected by the provided function.

arguments

Arguments

  • fun is a 1-arity function that is passed the attribute map for a given codepoint. if the function returns a falsy value then the codepoint is included in the returned data. If the return value is truthy then the codepoint is omitted from the returned list.

returns

Returns

  • a map of the codepoints that are not rejected mapped to their attributes.

example

Example

iex> Unicode.Unihan.reject(&(&1.kTotalStrokes[:"Hans"] > 30))
...> |> Enum.count()
97822

Takes an integer codepoint, a Unihan codepoint map, or list of maps and returns the grapheme (or list of graphemes) of the codepoint.

examples

Examples

iex> Unicode.Unihan.to_string(25342)
"拾"

iex> Unicode.Unihan.unihan("拾")
...> |> Unicode.Unihan.to_string()
"拾"
@spec unihan(binary() | integer()) :: any()

Returns the Unihan database metadata for a given codepoint.

The codepoint can be expressed as an integer or a grapheme.

examples

Examples

iex> Unicode.Unihan.unihan(171339)
%{
  codepoint: 171339,
  kCantonese: %{coda: "", final: "u", jyutping: "ju4", nucleus: "u", onset: "j", tone: "4"},
  kDefinition: ["(J) nonstandard variant of 魚 U+9B5A, fish"],
  kHanYu: %{page: 4674, position: 9, virtual: false, volume: 7},
  kIRGHanyuDaZidian: %{page: 4674, position: 9, virtual: false, volume: 7},
  kIRGKangXi: %{page: 1465, position: 1, virtual: true},
  kIRG_GSource: %{mapping: ["74674.09"], source: "GHZ"},
  kIRG_TSource: %{mapping: "3043", source: "T4"},
  kIRG_VSource: %{mapping: "29D4B", source: "VN"},
  kJapaneseKun: ["UO", "SAKANA", "SUNADORU"],
  kJapaneseOn: "GYO",
  kKangXi: %{page: 1465, position: 1, virtual: true},
  kNelson: 692,
  kPhonetic: %{class: 1605},
  kRSAdobe_Japan1_6: [
    %{cid: 13717, code: "C", kangxi: 195, strokes_radical: 10, strokes_residue: 0},
    %{cid: 13718, code: "V", kangxi: 195, strokes_radical: 10, strokes_residue: 0}
  ],
  kRSKangXi: %{radical: 195, strokes: 0},
  kRSUnicode: %{radical: 195, simplified_radical: false, strokes: 0},
  kTotalStrokes: %{Hans: 11, Hant: 11}
}

iex> Unicode.Unihan.unihan("㝰")
%{
  codepoint: 14192,
  kCangjie: ["J", "H", "U", "S"],
  kCantonese: %{coda: "n", final: "in", jyutping: "min4", nucleus: "i", onset: "m", tone: "4"},
  kDefinition: ["unable to meet, empty room"],
  kHanYu: %{page: 957, position: 3, virtual: false, volume: 2},
  kHanyuPinyin: %{location: [%{page: 20957, position: 3, virtual: false}], readings: ["mián"]},
  kIRGHanyuDaZidian: %{page: 957, position: 3, virtual: false, volume: 2},
  kIRGKangXi: %{page: 293, position: 1, virtual: false},
  kIRG_GSource: %{mapping: ["3E3C"], source: "G5"},
  kIRG_KSource: %{mapping: "236A", source: "K3"},
  kIRG_TSource: %{mapping: "5A7D", source: "T4"},
  kKangXi: %{page: 293, position: 1, virtual: false},
  kMandarin: "mián",
  kRSUnicode: %{radical: 40, simplified_radical: false, strokes: 15},
  kSBGY: %{page: 135, position: 35},
  kTotalStrokes: %{Hans: 18, Hant: 18}
}

Returns the field information for the data in the Unihan database.