Similarity v0.2.0 Similarity.Simhash View Source
Simhash string similarity algorithm. Description of Simhash
iex> Similarity.simhash("Barna", "Kovacs")
0.59375
iex> Similarity.simhash("Austria", "Australia")
0.65625
Link to this section Summary
Functions
Returns Hamming distance between the left
and right
hash,
given as lists of bits.
Returns the hash for the given string.
Calculates the similarity between the left and right string, using Simhash.
Returns a float representing similarity between left
and right
strings.
Link to this section Functions
Link to this function
hamming_distance(left, right, acc \\ 0) View Source
Returns Hamming distance between the left
and right
hash,
given as lists of bits.
Examples
iex> Similarity.Simhash.hamming_distance([1, 1, 0, 1, 0], [0, 1, 1, 1, 0])
2
Link to this function
hash(string, n)
View Source
hash(string, n)
View Source
hash(String.t(), pos_integer()) :: [0 | 1]
hash(String.t(), pos_integer()) :: [0 | 1]
Returns the hash for the given string.
Examples
Similarity.Simhash.hash("alma korte", 3)
[1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, ...]
Link to this function
similarity(left, right, options \\ [])
View Source
similarity(left, right, options \\ [])
View Source
similarity(String.t(), String.t(), pos_integer()) :: float()
similarity(String.t(), String.t(), pos_integer()) :: float()
Calculates the similarity between the left and right string, using Simhash.
Returns a float representing similarity between left
and right
strings.
Options
:ngram_size
- defaults to 3
Examples
iex> Similarity.simhash("khan academy", "khan academia")
0.890625
iex> Similarity.simhash("khan academy", "academy khan", ngram_size: 1)
1.0