View Source Unicode.Unihan (Unicode v0.1.0)

Functions to introspect the Unicode Unihan character database.

Link to this section Summary

Functions

Filter the Unihan database returning selected codepoints.

Filter the Unihan database returning selected codepoints that are not rejected by the provided function.

Takes an integer codepoint, a Unihan codepoint map, or list of maps and returns the grapheme (or list of graphemes) of the codepoint.

Returns the Unihan database as a mapping of a codepoint to its metadata.

Returns the Unihan database metadata for a given codepoint.

Returns the field information for the data in the Unihan database.

Link to this section Functions

Filter the Unihan database returning selected codepoints.

arguments

Arguments

  • fun is a 1-arity function that is passed the attribute map for a given codepoint. if the function returns a truthy value then the codepoint is included in the returned data. If the return value is falsy then the codepoint is ommitted from the returned list.

returns

Returns

  • a map of the filtered codepoints mapped to their attributes.

example

Example

iex> Unicode.Unihan.filter(&(&1.kTotalStrokes[:"Hans"] > 30))
...> |> Enum.count()
238

iex> Unicode.Unihan.filter(&(&1.kTotalStrokes[:"Hans"] != &1.kTotalStrokes[:"Hant"]))
...> |> Enum.count
3

iex> Unicode.Unihan.filter(&(&1[:kGradeLevel] <= 6))
...> |> Enum.count
2632

Filter the Unihan database returning selected codepoints that are not rejected by the provided function.

arguments

Arguments

  • fun is a 1-arity function that is passed the attribute map for a given codepoint. if the function returns a falsy value then the codepoint is included in the returned data. If the return value is truthy then the codepoint is ommitted from the returned list.

returns

Returns

  • a map of the codepoints that are not rejected mapped to their attributes.

example

Example

iex> Unicode.Unihan.reject(&(&1.kTotalStrokes[:"Hans"] > 30))
...> |> Enum.count()
97822

Takes an integer codepoint, a Unihan codepoint map, or list of maps and returns the grapheme (or list of graphemes) of the codepoint.

examples

Examples

iex> Unicode.Unihan.to_string(25342)
"拾"

iex> Unicode.Unihan.unihan("拾")
...> |> Unicode.Unihan.to_string()
"拾"

Returns the Unihan database as a mapping of a codepoint to its metadata.

@spec unihan(binary() | integer()) :: any()

Returns the Unihan database metadata for a given codepoint.

The codepoint can be expressed as an integer or a grapheme.

examples

Examples

iex> Unicode.Unihan.unihan(171339)
%{
  codepoint: 171339,
  kCantonese: %{coda: "", final: "u", jyutping: "ju4", nucleus: "u", onset: "j", tone: "4"},
  kDefinition: ["(J) nonstandard variant of 魚 U+9B5A, fish"],
  kHanYu: %{page: 4674, position: 9, virtual: false, volume: 7},
  kIRGHanyuDaZidian: %{page: 4674, position: 9, virtual: false, volume: 7},
  kIRGKangXi: %{page: 1465, position: 1, virtual: true},
  kIRG_GSource: %{mapping: ["74674.09"], source: "GHZ"},
  kIRG_TSource: %{mapping: "3043", source: "T4"},
  kIRG_VSource: %{mapping: "29D4B", source: "VN"},
  kJapaneseKun: ["UO", "SAKANA", "SUNADORU"],
  kJapaneseOn: "GYO",
  kKangXi: %{page: 1465, position: 1, virtual: true},
  kNelson: 692,
  kPhonetic: %{class: 1605},
  kRSAdobe_Japan1_6: [
    %{cid: 13717, code: "C", kangxi: 195, strokes_radical: 10, strokes_residue: 0},
    %{cid: 13718, code: "V", kangxi: 195, strokes_radical: 10, strokes_residue: 0}
  ],
  kRSKangXi: %{radical: 195, strokes: 0},
  kRSUnicode: %{radical: 195, simplified_radical: false, strokes: 0},
  kTotalStrokes: %{Hans: 11, Hant: 11}
}

iex> Unicode.Unihan.unihan("㝰")
%{
  codepoint: 14192,
  kCangjie: ["J", "H", "U", "S"],
  kCantonese: %{coda: "n", final: "in", jyutping: "min4", nucleus: "i", onset: "m", tone: "4"},
  kDefinition: ["unable to meet, empty room"],
  kHanYu: %{page: 957, position: 3, virtual: false, volume: 2},
  kHanyuPinyin: %{location: [%{page: 20957, position: 3, virtual: false}], readings: ["mián"]},
  kIRGHanyuDaZidian: %{page: 957, position: 3, virtual: false, volume: 2},
  kIRGKangXi: %{page: 293, position: 1, virtual: false},
  kIRG_GSource: %{mapping: ["3E3C"], source: "G5"},
  kIRG_KSource: %{mapping: "236A", source: "K3"},
  kIRG_TSource: %{mapping: "5A7D", source: "T4"},
  kKangXi: %{page: 293, position: 1, virtual: false},
  kMandarin: "mián",
  kRSUnicode: %{radical: 40, simplified_radical: false, strokes: 15},
  kSBGY: %{page: 135, position: 35},
  kTotalStrokes: %{Hans: 18, Hant: 18}
}

Returns the field information for the data in the Unihan database.