Text.WordCloud.Backends.RAKE (Text v0.6.1)

Copy Markdown View Source

RAKE (Rapid Automatic Keyword Extraction) backend for Text.WordCloud.

RAKE is a classic, language-agnostic keyword extractor (Rose et al., 2010). It splits the text on stopwords and punctuation to produce candidate phrases, then scores each member word by the ratio of its total degree (cumulative phrase length) to its raw frequency. A phrase score is the sum of its member word scores.

Higher scores = more important. The result naturally surfaces multi- word phrases, since phrases that consist exclusively of distinctive content words score above the sum of their parts.

Strengths

  • No reference corpus.

  • Phrase-aware by construction.

  • Trivially multilingual via the stopword swap (uses Text.Stopwords for the resolved language).

Caveats

  • Quality is sensitive to stopword-list completeness — under-filtering produces bloated, low-quality phrases.

  • Tends to over-rank phrases composed of low-frequency words; YAKE! is usually a better default.

Options

Honours the orchestrator's :language, :stopwords, :case_fold, and :locale. RAKE's candidates are determined by stopword boundaries; :ngram_range is honoured as a post-filter on the resulting phrase lengths.