unicode v1.0.0 Unicode

Provides functionality to efficiently check properties of Unicode codepoints, graphemes and strings.

The current implementation is based on Unicode version 8.0.0.

Summary

Functions

Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Alphabetic

True for alphanumeric characters, but much more performant than an :alnum: regexp checking the same thing

Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Lowercase

Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Math

True for the digits [0-9], but much more performant than a  regexp checking the same thing

Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Uppercase

Functions

alphabetic?(codepoint_or_string)
alphabetic?(String.codepoint | String.t) :: boolean

Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Alphabetic.

These are all characters that are usually used as representations of letters/syllabes/ in words/sentences. The function takes a unicode codepoint or a string as input.

For the string-version, the result will be true only if all codepoints in the string adhere to the property.

Examples

iex>Unicode.alphabetic?(?a)
true
iex>Unicode.alphabetic?("A")
true
iex>Unicode.alphabetic?("Elixir")
true
iex>Unicode.alphabetic?("الإكسير")
true
iex>Unicode.alphabetic?("foo, bar") # comma and whitespace
false
iex>Unicode.alphabetic?("42")
false
iex>Unicode.alphabetic?("龍王")
true
iex>Unicode.alphabetic?("∑") # Summation, ∑
false
iex>Unicode.alphabetic?("Σ") # Greek capital letter sigma, Σ
true
alphabetic?(string, block)
alphanumeric?(codepoint, block \\ [])

True for alphanumeric characters, but much more performant than an :alnum: regexp checking the same thing.

Returns true if Unicode.alphabetic?(x) or Unicode.numeric?(x).

Derived from http://www.unicode.org/reports/tr18/#alnum

Examples

iex> Unicode.alphanumeric? "1234"
true
iex> Unicode.alphanumeric? "KeyserSöze1995"
true
iex> Unicode.alphanumeric? "3段"
true
iex> Unicode.alphanumeric? "dragon@example.com"
false
lowercase?(codepoint_or_string)
lowercase?(String.codepoint | String.t) :: boolean

Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Lowercase.

Notice that there are many languages that do not have a distinction between cases. Their characters are not included in this group.

The function takes a unicode codepoint or a string as input.

For the string-version, the result will be true only if all codepoints in the string adhere to the property.

Examples

iex>Unicode.lowercase?(?a)
true
iex>Unicode.lowercase?("A")
false
iex>Unicode.lowercase?("Elixir")
false
iex>Unicode.lowercase?("léon")
true
iex>Unicode.lowercase?("foo, bar")
false
iex>Unicode.lowercase?("42")
false
iex>Unicode.lowercase?("Σ")
false
iex>Unicode.lowercase?("σ")
true
lowercase?(string, block)
math?(codepoint_or_string)
math?(String.codepoint | String.t) :: boolean

Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Math.

These are all characters whose primary usage is in mathematical concepts (and not in alphabets). Notice that the numerical digits are not part of this group. Use Unicode.digit?/1 instead.

The function takes a unicode codepoint or a string as input.

For the string-version, the result will be true only if all codepoints in the string adhere to the property.

Examples

iex>Unicode.math?(?=)
true
iex>Unicode.math?("=")
true
iex>Unicode.math?("1+1=2") # Note that digits themselves are not part of `Math`.
false
iex>Unicode.math?("परिस")
false
iex>Unicode.math?("∑") # Summation, ∑
true
iex>Unicode.math?("Σ") # Greek capital letter sigma, Σ
false
math?(string, block)
numeric?(codepoint)

True for the digits [0-9], but much more performant than a  regexp checking the same thing.

Derived from http://www.unicode.org/reports/tr18/#digit

Examples

iex> Unicode.numeric?("65535")
true
iex> Unicode.numeric?("42")
true
iex> Unicode.numeric?("lapis philosophorum")
false
uppercase?(codepoint_or_string)
uppercase?(String.codepoint | String.t) :: boolean

Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Uppercase.

Notice that there are many languages that do not have a distinction between cases. Their characters are not included in this group.

The function takes a unicode codepoint or a string as input.

For the string-version, the result will be true only if all codepoints in the string adhere to the property.

Examples

iex>Unicode.uppercase?(?a)
false
iex>Unicode.uppercase?("A")
true
iex>Unicode.uppercase?("Elixir")
false
iex>Unicode.uppercase?("CAMEMBERT")
true
iex>Unicode.uppercase?("foo, bar")
false
iex>Unicode.uppercase?("42")
false
iex>Unicode.uppercase?("Σ")
true
iex>Unicode.uppercase?("σ")
false
uppercase?(string, block)