TF-IDF backend for Text.WordCloud.
Scores each candidate term as tf(t) * idf(t), where tf is the
raw count in the foreground text and idf is the inverse-document
frequency over a user-supplied reference corpus. This is the
classical "what is distinctive about this document?" scorer — it
surfaces terms that are common in the foreground but rare across
the background.
Foreground vs background
The first argument to
Text.WordCloud.terms/2is the foreground: a single string or a list of strings (treated as one document).The reference corpus is supplied via the
:reference_corpusoption, either as a list of background documents (TF-IDF computes IDF over them) or as a precomputed%{term => idf}map.
Without a reference corpus the backend falls back to IDF = 1.0
for every term, reducing to a frequency cloud — which is rarely
what you want. The orchestrator emits an IO.warn/2 in that case.
Smoothing
Uses log-smoothed IDF with the standard log(N / (1 + df)) form:
N= number of reference documents.df_t= number of reference documents containing termt.
Terms unseen in the reference get IDF = log(N / 1) = log(N),
giving them a sensible high score rather than zero.
Defaults
:ngram_rangedefaults to{1, 1}for this backend — IDF over multi-token phrases is rarely meaningful unless the reference corpus is large enough that phrases recur. Override explicitly if you have such a corpus.
Standard Text.WordCloud orchestrator options (:language,
:stopwords, :case_fold, :locale) are honoured.