Text.WordCloud.Backends.TextRank (Text v0.6.0)

Copy Markdown View Source

TextRank backend for Text.WordCloud.

Implements the keyword-extraction variant of TextRank (Mihalcea & Tarau, Bringing Order into Texts, 2004). Builds an undirected, weighted graph where vertices are non-stopword tokens and edges connect tokens that co-occur within a sliding window. Weighted PageRank over that graph yields a relevance score per token; phrase candidates are then composed by joining adjacent high-scoring tokens.

Strengths

  • No reference corpus required.

  • Truly multilingual — like YAKE!, TextRank's only language-specific dependency is the stopword list.

  • Resilient to long documents — graph density grows linearly, not quadratically.

Caveats

  • Slower than YAKE! for short inputs (PageRank iterations dominate).

  • Phrase composition is heuristic: adjacent top-scoring tokens are glued, which can produce odd cuts on very dense topical text.

Options

  • :window_size — co-occurrence window. Default 4 (Mihalcea & Tarau use 2–10; 4 is a common middle ground).

  • :damping — PageRank damping factor. Default 0.85.

  • :tolerance — convergence threshold (max delta across vertices). Default 1.0e-5.

  • :max_iterations — safety cap. Default 100.

Standard Text.WordCloud orchestrator options (:language, :stopwords, :case_fold, :ngram_range, :locale) are honoured.