View Source Unicode.Unihan.Utils (Unicode Unihan v0.4.0)
Functions to parse the Unicode Unihan database files.
Summary
Functions
Parse the jyutping_index.csv file.
Parse one Unicode Unihan file and return a mapping from codepoint to a map of metadata for that codepoint.
Parse all Unicode Unihan files and return a mapping from codepoint to a map of metadata for that codepoint.
Parse the cjk_radicals.txt file.
Returns a map of the field definitions for a Unihan codepoint.
Functions
Parse the jyutping_index.csv file.
Parse one Unicode Unihan file and return a mapping from codepoint to a map of metadata for that codepoint.
Parse all Unicode Unihan files and return a mapping from codepoint to a map of metadata for that codepoint.
Parse the cjk_radicals.txt file.
There is one line per CJK radical number. Each line contains three fields, separated by a semicolon (';'). The first field is the CJK radical number. The second field is the CJK radical character, which may be absent. The third field is the CJK unified ideograph.
A given radical may have three variants, sharing the same radical number but described in separate lines. These variants are noted with one or two trailing apostrophes '
:
- one trailing apostrophe
'
: simplified radicals - two trailing apostrophe
''
: japanese radicals (added in 2023 version 15.1)
Returns a map of the field definitions for a Unihan codepoint.