View Source Unicode.String (Unicode String v1.1.0)
This module provides functions that implement somee of the Unicode stanards:
The Unicode Case Folding algorithm to provide case-independent equality checking irrespective of language or script.
The Unicode Segmentation algorithm to detect, break or splut strings into grapheme clusters, works and sentences.
Link to this section Summary
Functions
Returns match data indicating if the
requested break is applicable
at the point between the two string
segments represented by {string_before, string_after}
.
Returns a boolean indicating if the
requested break is applicable
at the point between the two string
segments represented by {string_before, string_after}
.
Compares two strings in a case insensitive manner.
Returns next segment in a string.
Splits a string according to the specified break type.
Returns an enumerable that splits a string on demand.
Link to this section Types
@type break_match() :: {break_or_no_break(), {String.t(), {String.t(), String.t()}}} | {break_or_no_break(), {String.t(), String.t()}}
@type break_or_no_break() :: :break | :no_break
@type break_type() :: :grapheme | :word | :line | :sentence
@type error_return() :: {:error, String.t()}
@type options() :: [locale: String.t(), break: break_type(), suppressions: boolean()]
@type split_options() :: [ locale: String.t(), break: break_type(), suppressions: boolean(), trim: boolean() ]
Link to this section Functions
@spec break(string_interval(), options()) :: break_match() | error_return()
Returns match data indicating if the
requested break is applicable
at the point between the two string
segments represented by {string_before, string_after}
.
arguments
Arguments
string
is anyString.t
.options
is a keyword list of options.
returns
Returns
A tuple indicating if a break would
be applicable at this point between
string_before
and string_after
.
{:break, {string_before, {matched_string, remaining_string}}}
or{:no_break, {string_before, {matched_string, remaining_string}}}
or{:error, reason}
options
Options
:locale
is any locale returned byUnicode.String.Segment.known_locales/0
. The default is "root" which corresponds to the break rules defined by the Unicode Segmentation rules.:break
is the type of break. It is one of:grapheme
,:word
,:line
or:sentence
. The default is:word
.:suppressions
is a boolean which, iftrue
, will suppress breaks for common abbreviations defined for thelocale
. The default istrue
.
examples
Examples
iex> Unicode.String.break {"This is ", "some words"}
{:break, {"This is ", {"s", "ome words"}}}
iex> Unicode.String.break {"This is ", "some words"}, break: :sentence
{:no_break, {"This is ", {"s", "ome words"}}}
iex> Unicode.String.break {"This is one. ", "This is some words."}, break: :sentence
{:break, {"This is one. ", {"T", "his is some words."}}}
@spec break?(string_interval(), options()) :: boolean()
Returns a boolean indicating if the
requested break is applicable
at the point between the two string
segments represented by {string_before, string_after}
.
arguments
Arguments
string
is anyString.t
.options
is a keyword list of options.
returns
Returns
true
orfalse
orraises an exception if there is an error
options
Options
:locale
is any locale returned byUnicode.String.Segment.known_locales/0
. The default is "root" which corresponds to the break rules defined by the Unicode Segmentation rules.:break
is the type of break. It is one of:grapheme
,:word
,:line
or:sentence
. The default is:word
.:suppressions
is a boolean which, iftrue
, will suppress breaks for common abbreviations defined for thelocale
. The default istrue
.
examples
Examples
iex> Unicode.String.break? {"This is ", "some words"}
true
iex> Unicode.String.break? {"This is ", "some words"}, break: :sentence
false
iex> Unicode.String.break? {"This is one. ", "This is some words."}, break: :sentence
true
Compares two strings in a case insensitive manner.
Case folding is applied to the two string
arguments which are then compared with the
==
operator.
arguments
Arguments
string_a
andstring_b
are two strings to be comparedtype
is the case folding type to be applied. The alternatives are:full
,:simple
and:turkic
. The default is:full
.
returns
Returns
true
orfalse
notes
Notes
This function applies the Unicode Case Folding algorithm
The algorithm does not apply any treatment to diacritical marks hence "compare strings without accents" is not part of this function.
examples
Examples
iex> Unicode.String.equals_ignoring_case? "ABC", "abc"
true
iex> Unicode.String.equals_ignoring_case? "beißen", "beissen"
true
iex> Unicode.String.equals_ignoring_case? "grüßen", "grussen"
false
@spec next(String.t(), split_options()) :: String.t() | nil | error_return()
Returns next segment in a string.
arguments
Arguments
string
is anyString.t
.options
is a keyword list of options.
returns
Returns
A tuple with the segment and the remainder of the string or ""
in case the String reached its end.
{next_string, rest_of_the_string}
or{:error, reason}
options
Options
:locale
is any locale returned byUnicode.String.Segment.known_locales/0
. The default is "root" which corresponds to the break rules defined by the Unicode Segmentation rules.:break
is the type of break. It is one of:grapheme
,:word
,:line
or:sentence
. The default is:word
.:suppressions
is a boolean which, iftrue
, will suppress breaks for common abbreviations defined for thelocale
. The default istrue
.
examples
Examples
iex> Unicode.String.next "This is a sentence. And another.", break: :word
{"This", " is a sentence. And another."}
iex> Unicode.String.next "This is a sentence. And another.", break: :sentence
{"This is a sentence. ", "And another."}
@spec split(String.t(), split_options()) :: [String.t(), ...] | error_return()
Splits a string according to the specified break type.
arguments
Arguments
string
is anyString.t
.options
is a keyword list of options.
returns
Returns
A list of strings after applying the specified break rules or
{:error, reason}
options
Options
:locale
is any locale returned byUnicode.String.Segment.known_locales/0
. The default is "root" which corresponds to the break rules defined by the Unicode Segmentation rules.:break
is the type of break. It is one of:grapheme
,:word
,:line
or:sentence
. The default is:word
.:suppressions
is a boolean which, iftrue
, will suppress breaks for common abbreviations defined for thelocale
. The default istrue
.:trim
is a boolean indicating if segments the are comprised of only white space are to be excluded from the returned list. The default isfalse
.
examples
Examples
iex> Unicode.String.split "This is a sentence. And another.", break: :word
["This", " ", "is", " ", "a", " ", "sentence", ".", " ", "And", " ", "another", "."]
iex> Unicode.String.split "This is a sentence. And another.", break: :word, trim: true
["This", "is", "a", "sentence", ".", "And", "another", "."]
iex> Unicode.String.split "This is a sentence. And another.", break: :sentence
["This is a sentence. ", "And another."]
@spec splitter(String.t(), split_options()) :: function() | error_return()
Returns an enumerable that splits a string on demand.
arguments
Arguments
string
is anyString.t
.options
is a keyword list of options.
returns
Returns
A function that implements the enumerable protocol or
{:error, reason}
options
Options
:locale
is any locale returned byUnicode.String.Segment.known_locales/0
. The default is "root" which corresponds to the break rules defined by the Unicode Segmentation rules.:break
is the type of break. It is one of:grapheme
,:word
,:line
or:sentence
. The default is:word
.:suppressions
is a boolean which, iftrue
, will suppress breaks for common abbreviations defined for thelocale
. The default istrue
.:trim
is a boolean indicating if segments the are comprised of only white space are to be excluded from the returned list. The default isfalse
.
examples
Examples
iex> enum = Unicode.String.splitter "This is a sentence. And another.", break: :word, trim: true
iex> Enum.take enum, 3
["This", "is", "a"]