rebar3_erli18n_jaro (rebar3_erli18n v0.1.0)

Copy Markdown View Source

Jaro-similarity fuzzy matcher for msgmerge-style merge.

When merge finds a msgid in the old catalog that no longer appears in the freshly extracted .pot, it tries to pair it with a NEW msgid (one present in the new .pot but absent from the old catalog) so the translator's work can carry over as a #, fuzzy entry instead of being lost to #~ obsolete. The pairing uses string:jaro_similarity/2 (stdlib, OTP 27+), comparing each removed msgid against each added msgid.

A pair is accepted only when its similarity is at or above the threshold (default 0.8, matching GNU msgmerge's fuzzy heuristic spirit). Among candidates above the threshold the highest score wins; ties break deterministically on the candidate's position in the supplied list (earlier wins), so the result never depends on map iteration order.

The comparison is bounded: O(|removed| x |added|) similarity calls, each over two bounded strings — there is no catalog cross-product beyond that.

Summary

Functions

Find the best fuzzy match for Needle among Candidates, default threshold.

Find the best fuzzy match for Needle among Candidates at Threshold.

Jaro similarity of two binaries, in 0.0..1.0.

Functions

best_match(Needle, Candidates)

-spec best_match(binary(), [binary()]) -> {ok, binary(), float()} | nomatch.

Find the best fuzzy match for Needle among Candidates, default threshold.

Equivalent to best_match(Needle, Candidates, 0.8).

best_match(Needle, Candidates, Threshold)

-spec best_match(binary(), [binary()], float()) -> {ok, binary(), float()} | nomatch.

Find the best fuzzy match for Needle among Candidates at Threshold.

Returns {ok, Match, Score} for the highest-scoring candidate whose similarity is >= Threshold, breaking ties toward the earlier candidate in the list. Returns nomatch when no candidate reaches the threshold (or the list is empty).

similarity(A, B)

-spec similarity(binary(), binary()) -> float().

Jaro similarity of two binaries, in 0.0..1.0.

A thin wrapper over string:jaro_similarity/2 that accepts binaries directly. Two empty strings are defined as fully similar (1.0), matching the stdlib.