fuzzy_compare v1.0.0 FuzzyCompare.SubstringComparison View Source

This module offers the functionality of comparing strings of different lengths.

iex> FuzzyCompare.SubstringComparison.similarity("DEUTSCHLAND", "BUNDESREPUBLIK DEUTSCHLAND")
0.9090909090909092

iex> String.jaro_distance("DEUTSCHLAND", "BUNDESREPUBLIK DEUTSCHLAND")
0.5399600399600399

Link to this section Summary

Functions

The ratio function takes two strings as arguments and returns the substring similarity of those strings as a float between 0 and 1

Link to this section Functions

Link to this function similarity(left, right) View Source
similarity(binary(), binary()) :: float()

The ratio function takes two strings as arguments and returns the substring similarity of those strings as a float between 0 and 1.

The substring matching works by generating a list of equal substrings by means of Myers Difference, comparing these substrings with the Jaro-Winkler function against the shorter one of the two input strings and finally returning the maximum comparison value found.

Let us assume as the input string the following: "DEUTSCHLAND" and "BUNDESREPUBLIK DEUTSCHLAND". This yields the the matching substrings of ["DE", "U", "TSCHLAND"].

We compare each one of them to the shorter one of the input strings:

iex> String.jaro_distance("DE", "DEUTSCHLAND")
0.7272727272727272

iex> String.jaro_distance("U", "DEUTSCHLAND")
0.6969696969696969

iex> String.jaro_distance("TSCHLAND", "DEUTSCHLAND")
0.9090909090909092

Of all comparisons the highest value gets returned.