Intl.Segmenter (Intl v0.2.0)

Copy Markdown View Source

Text segmentation, modelled on Intl.Segmenter.

Splits text into segments by grapheme cluster, word, or sentence boundaries.

  • :grapheme segmentation uses Elixir's built-in String.graphemes/1 and is always available.

  • :word and :sentence segmentation requires the optional unicode_string dependency. When that library is not installed, these granularities return an error.

Note: the JS Intl.Segmenter returns an iterable of rich segment objects with segment, index, input, and isWordLike properties. This module returns a flat list of segment strings for simplicity.

Summary

Functions

Segments a string into a list of substrings.

Segments a string, raising on error.

Functions

segment(string, options \\ [])

@spec segment(String.t(), Keyword.t()) :: {:ok, [String.t()]} | {:error, term()}

Segments a string into a list of substrings.

Arguments

  • string is the text to segment.

  • options is a keyword list of options.

Options

  • :granularity is :grapheme, :word, or :sentence. The default is :grapheme.

  • :locale is a locale identifier string. Only used for :word and :sentence granularity. The default is "root".

  • :trim is a boolean. When true, whitespace-only segments are removed. Only applies to :word and :sentence granularity. The default is false.

Returns

  • {:ok, segments} where segments is a list of strings.

  • {:error, reason} if the granularity is not supported or the unicode_string dependency is missing.

Examples

iex> Intl.Segmenter.segment("héllo", granularity: :grapheme)
{:ok, ["h", "é", "l", "l", "o"]}

segment!(string, options \\ [])

@spec segment!(String.t(), Keyword.t()) :: [String.t()] | no_return()

Segments a string, raising on error.

Same as segment/2 but returns the list directly or raises.

Arguments

  • string is the text to segment.

  • options is a keyword list of options.

Returns

  • A list of segment strings.

Examples

iex> Intl.Segmenter.segment!("héllo", granularity: :grapheme)
["h", "é", "l", "l", "o"]