View Source Unicode.Unihan.Utils (Unicode Unihan v0.4.0)

Functions to parse the Unicode Unihan database files.

Summary

Functions

Parse the jyutping_index.csv file.

Parse one Unicode Unihan file and return a mapping from codepoint to a map of metadata for that codepoint.

Parse all Unicode Unihan files and return a mapping from codepoint to a map of metadata for that codepoint.

Parse the cjk_radicals.txt file.

Returns a map of the field definitions for a Unihan codepoint.

Functions

Parse the jyutping_index.csv file.

Link to this function

parse_file(file, map \\ %{})

View Source

Parse one Unicode Unihan file and return a mapping from codepoint to a map of metadata for that codepoint.

Parse all Unicode Unihan files and return a mapping from codepoint to a map of metadata for that codepoint.

Parse the cjk_radicals.txt file.

There is one line per CJK radical number. Each line contains three fields, separated by a semicolon (';'). The first field is the CJK radical number. The second field is the CJK radical character, which may be absent. The third field is the CJK unified ideograph.

A given radical may have three variants, sharing the same radical number but described in separate lines. These variants are noted with one or two trailing apostrophes ':

  • one trailing apostrophe ': simplified radicals
  • two trailing apostrophe '': japanese radicals (added in 2023 version 15.1)

Returns a map of the field definitions for a Unihan codepoint.

Link to this function

unihan_properties_file()

View Source