YAKE! (Yet Another Keyword Extractor) backend for Text.WordCloud.
Implements the unsupervised, statistical keyword-extraction algorithm described in Campos et al., Information Sciences 509, 2020. YAKE! computes five per-word features (casing, position, frequency, relatedness to context, sentence dispersion) entirely from the input document, then composes them into n-gram candidate scores. No reference corpus or trained model is required — this is what makes it the right default for a multilingual word-cloud library.
The algorithm's only language-specific dependency is the stopword
list, supplied via Text.Stopwords.for/1 (or the caller's :stopwords
override). YAKE!'s own design treats stopwords as phrase-boundary
markers and as low-content interior fillers, so a good list directly
improves output quality.
Score direction
YAKE!'s published score is "lower = more important". This module inverts internally before returning, so the value passed to the orchestrator is the standard "higher = more important" form every other backend uses.
Options
:ngram_range—{min, max}candidate length. Defaults to{1, 3}(the YAKE paper's default).:window_size— neighbour-context window for the relatedness feature. Defaults to1(immediate neighbours), matching the reference implementation.
Standard Text.WordCloud orchestrator options (:language,
:stopwords, :case_fold, :locale) are honoured.
Caveats
This is a faithful but simplified port of the algorithm: the five
features are computed exactly as in the paper, but the
candidate-generation rules use the stricter "phrases must start
and end with a non-stopword" form rather than the paper's full
composition rules. In practice this produces output well-correlated
with the reference Python implementation (LIAAD/yake); a
differential-fixture test against that implementation is a
follow-up.