ExkPasswd.Transform.Pinyin (ExkPasswd v0.2.0)

View Source

Converts Chinese characters to Pinyin romanization for keyboard compatibility.

This transform enables passwords with Chinese words to be typed on any keyboard layout (QWERTY, AZERTY, etc.) while maintaining memorability for Chinese speakers.

Use Case

  • Dictionary: Chinese words (memorability in native language)
  • Transform: Pinyin conversion (ASCII output for compatibility)
  • Result: Memorable for Chinese speakers, compatible with all systems

Examples

# Load Chinese dictionary
ExkPasswd.Dictionary.load_custom(:chinese, ["中国", "世界", "你好", "朋友"])

config = ExkPasswd.Config.new!(
  dictionary: :chinese,
  word_length: 2..4,
  word_length_bounds: 1..10,
  separator: "-",
  meta: %{
    transforms: [%ExkPasswd.Transform.Pinyin{}]
  }
)

ExkPasswd.generate(config)
#=> "45-zhongguo-shijie-nihao-89"
# Memorable: Chinese speaker remembers "中国 世界 你好"
# Compatible: Works on any keyboard, any system

Coverage

This module includes 500+ of the most frequent Chinese characters based on Jun Da's Modern Chinese Character Frequency List, covering approximately 95% of characters encountered in everyday Chinese text.

Romanization Style

This implementation follows Hanyu Pinyin with keyboard-compatible conventions:

ü Handling (Critical for Keyboard Input)

The vowel ü is handled according to IME input conventions:

  • After l or n: written as v (lv for 绿, nv for 女)
  • After j, q, x, y: written as u (ju, qu, xu, yu)

This matches how Chinese speakers actually type on QWERTY keyboards.

Tone Omission

Tones are omitted for simplicity and keyboard compatibility:

  • 妈麻马骂 all become "ma"
  • This reduces ~10,000 characters to 410 unique syllables

Security Note: Toneless pinyin has lower entropy than character-based passwords. Compensate by using more words in your configuration.

Polyphone Handling (多音字)

Many Chinese characters have multiple pronunciations depending on context:

  • 和: hé (and), hè (join in), huó (mix), huò (mix powder)
  • 了: le (particle), liǎo (finish)
  • 长: cháng (long), zhǎng (grow)

This module uses the most common pronunciation for each character. For password generation, this is acceptable as the goal is keyboard compatibility, not linguistic precision.

Hanzi Detection

Use contains_hanzi?/1 to check if text contains Chinese characters:

ExkPasswd.Transform.Pinyin.contains_hanzi?("你好")     #=> true
ExkPasswd.Transform.Pinyin.contains_hanzi?("hello")   #=> false
ExkPasswd.Transform.Pinyin.contains_hanzi?("中英mix")  #=> true

Limitations

  • Polyphone disambiguation uses most common pronunciation only
  • Characters not in the mapping are passed through unchanged
  • No apostrophe insertion for syllable boundaries (xi'an → xian)
  • Simplified Chinese characters only (Traditional may work for shared characters)

Summary

Functions

Check if a string contains Chinese characters (Hanzi).

Check if a single character is a Chinese character (Hanzi).

Returns the Pinyin romanization mapping.

Types

t()

@type t() :: %ExkPasswd.Transform.Pinyin{}

Functions

contains_hanzi?(text)

@spec contains_hanzi?(String.t()) :: boolean()

Check if a string contains Chinese characters (Hanzi).

Detects characters in the CJK Unified Ideographs Unicode ranges.

Unicode Ranges Covered

  • CJK Unified Ideographs: U+4E00 to U+9FFF (most common)
  • CJK Extension A: U+3400 to U+4DBF

Examples

iex> ExkPasswd.Transform.Pinyin.contains_hanzi?("你好")
true

iex> ExkPasswd.Transform.Pinyin.contains_hanzi?("hello")
false

iex> ExkPasswd.Transform.Pinyin.contains_hanzi?("中英mix")
true

iex> ExkPasswd.Transform.Pinyin.contains_hanzi?("")
false

hanzi?(char)

@spec hanzi?(String.t()) :: boolean()

Check if a single character is a Chinese character (Hanzi).

Examples

iex> ExkPasswd.Transform.Pinyin.hanzi?("中")
true

iex> ExkPasswd.Transform.Pinyin.hanzi?("a")
false

iex> ExkPasswd.Transform.Pinyin.hanzi?("日")
true

iex> ExkPasswd.Transform.Pinyin.hanzi?("")
false

pinyin_map()

@spec pinyin_map() :: %{required(String.t()) => String.t()}

Returns the Pinyin romanization mapping.

This function provides access to the internal mapping of Chinese characters to their Pinyin romanization equivalents.

Returns

A map of Chinese characters to Pinyin strings.

Examples

iex> map = ExkPasswd.Transform.Pinyin.pinyin_map()
...> map["中"]
"zhong"

iex> map = ExkPasswd.Transform.Pinyin.pinyin_map()
...> map["你"]
"ni"

iex> map = ExkPasswd.Transform.Pinyin.pinyin_map()
...> map["女"]
"nv"