textmetrics/search
Convenience helpers for spell-correction-style search built on
distance and
similarity.
Both functions are deterministic: ties are broken by the candidates’ original input order.
Types
A scored candidate produced by rank_jaro_winkler.
Both fields are labelled so callers can read them directly as
r.label and r.score without destructuring a tuple.
pub type Ranked {
Ranked(label: String, score: Float)
}
Constructors
-
Ranked(label: String, score: Float)
Values
pub fn closest(
query: String,
candidates: List(String),
max_distance: Int,
) -> option.Option(String)
Single closest candidate within max_distance Levenshtein
graphemes of query. Returns None when no candidate is close
enough or when candidates is empty.
This is the convenience form of did_you_mean for
the dominant CLI use case (“Unknown command. Did you mean X?”).
Ties on distance are broken by the candidate’s position in
candidates — the first qualifying candidate wins.
Returns Option(String) rather than Result(String, Nil) because
“no candidate within the threshold” is an expected, semantically
empty result rather than a failure. The matching companion
did_you_mean already returns a (possibly empty)
list for the same reason.
Inherits the whole-string measurement from
did_you_mean: the max_distance budget is
compared against the entire candidate, not against substrings or
tokens. Tokenise the candidate set first when working with
prose-style candidates — see the did_you_mean doc-comment for
the recipe.
pub fn did_you_mean(
query: String,
candidates: List(String),
max_distance: Int,
) -> List(String)
Return candidates within max_distance Levenshtein graphemes of
query, sorted ascending by distance. Empty list when nothing
matches or when candidates is empty.
Ties on distance are broken by the candidate’s position in
candidates.
max_distance is measured against the whole candidate string,
not against substrings or tokens. The function is the right tool
for short single-token candidate sets (command names, enum
values, short labels) where the typo budget is similar in
magnitude to the candidate’s length. For prose-style candidates
(multi-word titles, sentences), the length difference between a
short query and a long candidate dominates the Levenshtein
distance — did_you_mean("vulcano", ["Volcano in Iceland"], 4)
returns [], not ["Volcano in Iceland"], because the distance
is ~12. Tokenise the candidate set first when the use case is
prose-style:
list.flat_map(titles, string.split(_, on: " "))
|> search.did_you_mean(query, _, 2)
pub fn rank_jaro_winkler(
query: String,
candidates: List(String),
top_n: Int,
) -> List(Ranked)
Rank candidates by Jaro-Winkler similarity (Winkler-1990
defaults) descending, returning up to top_n Ranked
records.
Ties on similarity are broken by the candidate’s position in
candidates. When top_n <= 0 returns an empty list.