Text.Extract.Script (Text v0.6.0)

Copy Markdown View Source

UTR #39 §5.1 single-script restriction check, used by Text.Extract to flag mixed-script hosts as potential homograph attacks.

Operates on the Unicode form of an IDN host string. Returns true if all characters belong to a single script (ignoring :common and :inherited, and treating Han/Hiragana/Katakana as one augmented Japanese script per UTR #39 §5.2). Returns false otherwise — that's the case where аpple.com (Cyrillic а mixed with Latin) would be rejected by :strict_idn.

Implementation delegates to Unicode.script_dominance/1 from the :unicode package, which counts script occurrences efficiently.

Summary

Functions

Returns true if text uses only a single script (per UTR #39 §5.1, with the Japanese augmented set in §5.2).

Functions

single_script?(text)

@spec single_script?(String.t()) :: boolean()

Returns true if text uses only a single script (per UTR #39 §5.1, with the Japanese augmented set in §5.2).

Arguments

  • text is a UTF-8 string.

Returns

  • A boolean.

Examples

iex> Text.Extract.Script.single_script?("paypal.com")
true

iex> Text.Extract.Script.single_script?("müller.de")
true

# Cyrillic "а"
iex> Text.Extract.Script.single_script?(<<"а"::utf8>> <> "pple.com")
false

iex> Text.Extract.Script.single_script?("漢字ひらがな")
true

iex> Text.Extract.Script.single_script?("123")
true

iex> Text.Extract.Script.single_script?("")
true