Text.WordCloud (Text v0.6.1)

Copy Markdown View Source

Builds a weighted list of terms suitable for rendering as a word cloud.

The function returns a list of %{term, weight, count, kind} maps sorted by :weight (descending). The top term always has weight 1.0; every other weight is normalised relative to it. Visual layout — placing the words on a canvas — is handled separately by Text.WordCloud.Layout.

Supports several scoring algorithms via the :scoring option; :yake (the default) requires no reference corpus and is multilingual by construction. See the Text.WordCloud.Backends.* modules for the catalogue.

Multilingual end-to-end:

Summary

Types

A scored term, ready for rendering.

Functions

Returns a weighted list of terms for text suitable for word-cloud rendering.

Converts scored terms into the shape consumed by d3-cloud.

Types

term_entry()

@type term_entry() :: %{
  term: String.t(),
  weight: float(),
  count: pos_integer(),
  kind: :word | :phrase
}

A scored term, ready for rendering.

Functions

terms(text, options \\ [])

@spec terms(
  String.t() | [String.t()],
  keyword()
) :: [term_entry()]

Returns a weighted list of terms for text suitable for word-cloud rendering.

Arguments

  • text is a UTF-8 string or a list of strings. A list is treated as a corpus of independent documents.

Options

  • :scoring:yake (default), :frequency, :tf_idf, :rake, :text_rank, :key_bert, or any module implementing Text.WordCloud.Backend.

  • :max_terms — cap on returned entries. Default 100.

  • :min_count — drop terms occurring fewer times than this. Default 1.

  • :ngram_range{min, max} token length for candidate terms. Default depends on backend ({1, 3} for YAKE, {1, 1} for Frequency).

  • :language — atom, BCP-47 string, or Localize.LanguageTag. Default nil (no language-specific behaviour). Pass {:auto, model} to auto-detect via a pre-loaded Text.Language.Classifier.Fasttext.Model — the orchestrator does not load the fastText model itself, so callers wanting detection load it once at boot and hand it in.

  • :stopwords:auto (use the bundled list for the resolved language; default), :none, a list, a MapSet, or {:extend, [extra]} to add to the bundled list.

  • :case_fold — boolean, default true.

  • :stem — boolean, default false. When true, candidate terms are bucketed by their Snowball stem so morphological variants (demolish, demolished, demolishing, demolition) collapse into a single entry. The most-frequent surface form represents the bucket; counts and raw scores are summed across members. Requires the optional :text_stemmer dependency. The stemmer language defaults to the resolved :language; override with :stem_language.

  • :stem_language — atom override for the stemmer language. Useful when the corpus language differs from the bucketing language (e.g. mixed-language text where you want only English variants consolidated). Defaults to :language.

  • :include:all (default), :words only, or :phrases only.

  • :reference_corpus — used by :tf_idf and :log_likelihood.

Returns

  • A list of %{term, weight, count, kind} maps sorted by :weight descending. The top entry has weight: 1.0.

Examples

iex> text = "the cat sat on the mat. the cat ran. the cat slept."
iex> [first | _] = Text.WordCloud.terms(text, scoring: :frequency, language: :en, max_terms: 3)
iex> first.term
"cat"

to_d3_cloud(terms, options \\ [])

@spec to_d3_cloud(
  [term_entry()],
  keyword()
) :: [
  %{
    text: String.t(),
    size: float(),
    weight: float(),
    count: pos_integer(),
    kind: :word | :phrase
  }
]

Converts scored terms into the shape consumed by d3-cloud.

d3-cloud expects an array of {text, size} records and runs its Wordle-style layout in the browser. This adapter maps each entry's :weight to a pixel font size using the same :font_size_range vocabulary as Text.WordCloud.Layout, so a server-rendered SVG and a client-rendered d3-cloud will scale identically.

The original :weight, :count, and :kind fields are passed through unchanged. d3-cloud ignores them but exposes the full datum to its text, fontSize, fontWeight, and rotate callbacks, so consumers can read e.g. d.count for tooltips with no extra plumbing.

Arguments

Options

  • :font_size_range is a {min, max} pixel tuple. Weight 1.0 maps to max, weight 0.0 maps to min. Default {12, 96}.

  • :scale is :linear (default) or :sqrt. :sqrt produces area-proportional sizing, which is the convention most d3-cloud examples use. :linear matches Text.WordCloud.Layout's behaviour.

Returns

  • A list of %{text, size, weight, count, kind} maps sorted by :size descending. The :text and :size keys are what d3-cloud consumes; the rest are passed through for callbacks.

Examples

iex> terms = [
...>   %{term: "elixir", weight: 1.0, count: 5, kind: :word},
...>   %{term: "phoenix", weight: 0.5, count: 2, kind: :word}
...> ]
iex> Text.WordCloud.to_d3_cloud(terms, font_size_range: {10, 100})
[
  %{text: "elixir", size: 100.0, weight: 1.0, count: 5, kind: :word},
  %{text: "phoenix", size: 55.0, weight: 0.5, count: 2, kind: :word}
]