RAKE (Rapid Automatic Keyword Extraction) backend for Text.WordCloud.
RAKE is a classic, language-agnostic keyword extractor (Rose et al., 2010). It splits the text on stopwords and punctuation to produce candidate phrases, then scores each member word by the ratio of its total degree (cumulative phrase length) to its raw frequency. A phrase score is the sum of its member word scores.
Higher scores = more important. The result naturally surfaces multi- word phrases, since phrases that consist exclusively of distinctive content words score above the sum of their parts.
Strengths
No reference corpus.
Phrase-aware by construction.
Trivially multilingual via the stopword swap (uses
Text.Stopwordsfor the resolved language).
Caveats
Quality is sensitive to stopword-list completeness — under-filtering produces bloated, low-quality phrases.
Tends to over-rank phrases composed of low-frequency words; YAKE! is usually a better default.
Options
Honours the orchestrator's :language, :stopwords, :case_fold,
and :locale. RAKE's candidates are determined by stopword
boundaries; :ngram_range is honoured as a post-filter on the
resulting phrase lengths.