View Source Unicode.Unihan (Unicode Unihan v0.4.0)

Functions to introspect the Unicode Unihan character database.

Summary

Functions

Filter the Unihan database returning selected codepoints.

Load the unihan data into :persistent_term.

Filter the Unihan database returning selected codepoints that are not rejected by the provided function.

Takes an integer codepoint, a Unihan codepoint map, or list of maps and returns the grapheme (or list of graphemes) of the codepoint.

Returns the Unihan database metadata for a given codepoint.

Returns the property information for the data in the Unihan database.

Functions

Filter the Unihan database returning selected codepoints.

Arguments

  • fun is a 1-arity function that is passed the attribute map for a given codepoint. if the function returns a truthy value then the codepoint is included in the returned data. If the return value is falsy then the codepoint is omitted from the returned list.

Returns

  • a map of the filtered codepoints mapped to their attributes.

Example

iex> Unicode.Unihan.filter(&(&1.kTotalStrokes[:"Hans"] > 30))
...> |> Enum.count()
238

iex> Unicode.Unihan.filter(&(&1.kTotalStrokes[:"Hans"] != &1.kTotalStrokes[:"Hant"]))
...> |> Enum.count
3

iex> Unicode.Unihan.filter(&(&1[:kGradeLevel] <= 6))
...> |> Enum.count
2632

Load the unihan data into :persistent_term.

This function will be called on the first access by Unicode.Unihan.unihan/1 but can be called on application load if required.

First the existence of an erlang term format file of the unihan database is found. If so, it is loaded. If not (the first time the function is called), the file is generated and then loaded.

Filter the Unihan database returning selected codepoints that are not rejected by the provided function.

Arguments

  • fun is a 1-arity function that is passed the attribute map for a given codepoint. if the function returns a falsy value then the codepoint is included in the returned data. If the return value is truthy then the codepoint is omitted from the returned list.

Returns

  • a map of the codepoints that are not rejected mapped to their attributes.

Example

iex> Unicode.Unihan.reject(&(&1.kTotalStrokes[:"Hans"] > 30))
...> |> Enum.count()
98444

Takes an integer codepoint, a Unihan codepoint map, or list of maps and returns the grapheme (or list of graphemes) of the codepoint.

Examples

iex> Unicode.Unihan.to_string(25342)
"拾"

iex> Unicode.Unihan.unihan("拾")
...> |> Unicode.Unihan.to_string()
"拾"
@spec unihan(binary() | integer()) :: any()

Returns the Unihan database metadata for a given codepoint.

The codepoint can be expressed as an integer or a grapheme.

Examples

iex> Unicode.Unihan.unihan(171339)
%{
  codepoint: 171339,
  kTotalStrokes: %{Hant: 11, Hans: 11},
  kCantonese: %{
    final: "u",
    jyutping: "ju4",
    coda: "",
    nucleus: "u",
    onset: "j",
    tone: "4"
  },
  kDefinition: ["(J) nonstandard variant of 魚 U+9B5A, fish"],
  kHanYu: %{position: 9, virtual: false, page: 4674, volume: 7},
  kIRG_GSource: %{source: "GHZ", mapping: ["74674.09"]},
  kIRG_TSource: %{source: "T4", mapping: "3043"},
  kIRG_VSource: %{source: "VN", mapping: "29D4B"},
  kIRGHanyuDaZidian: %{position: 9, virtual: false, page: 4674, volume: 7},
  kIRGKangXi: %{position: 1, virtual: true, page: 1465},
  kJapaneseKun: ["UO", "SAKANA", "SUNADORU"],
  kJapaneseOn: "GYO",
  kKangXi: %{position: 1, virtual: true, page: 1465},
  kMorohashi: %{index: 45958, prime: ""},
  kNelson: 692,
  kPhonetic: %{class: 1605},
  kRSAdobe_Japan1_6: [
    %{
      code: "C",
      cid: 13717,
      kangxi: 195,
      strokes_radical: 10,
      strokes_residue: 0
    },
    %{
      code: "V",
      cid: 13718,
      kangxi: 195,
      strokes_radical: 10,
      strokes_residue: 0
    }
  ],
  kRSUnicode: %{radical: 195, strokes: 0, simplified_radical: false},
  kJapanese: ["ギョ", "うお"],
  kMojiJoho: "MJ055080"
}

iex> Unicode.Unihan.unihan("㝰")
%{
  codepoint: 14192,
  kTotalStrokes: %{Hant: 18, Hans: 18},
  kCangjie: ["J", "H", "U", "S"],
  kCantonese: %{
    final: "in",
    jyutping: "min4",
    coda: "n",
    nucleus: "i",
    onset: "m",
    tone: "4"
  },
  kDefinition: ["unable to meet, empty room"],
  kHanYu: %{position: 3, virtual: false, page: 957, volume: 2},
  kHanyuPinyin: %{
    location: [%{position: 3, virtual: false, page: 20957}],
    readings: ["mián"]
  },
  kIRG_GSource: %{source: "G5", mapping: ["3E3C"]},
  kIRG_KSource: %{source: "K3", mapping: "236A"},
  kIRG_TSource: %{source: "T4", mapping: "5A7D"},
  kIRGHanyuDaZidian: %{position: 3, virtual: false, page: 957, volume: 2},
  kIRGKangXi: %{position: 1, virtual: false, page: 293},
  kKangXi: %{position: 1, virtual: false, page: 293},
  kMandarin: "mián",
  kMorohashi: %{index: 7359, prime: ""},
  kRSUnicode: %{radical: 40, strokes: 15, simplified_radical: false},
  kSBGY: %{position: 35, page: 135},
  kJapanese: ["ベン", "メン"],
  kMojiJoho: "MJ000772"
}

Returns the property information for the data in the Unihan database.