View Source Similarity.Simhash (Similarity v0.2.4)
Simhash string similarity algorithm. Description of Simhash
iex> Similarity.simhash("Barna", "Kovacs")
0.59375
iex> Similarity.simhash("Austria", "Australia")
0.65625
Link to this section Summary
Functions
Returns Hamming distance between the left
and right
hash,
given as lists of bits.
Returns the hash for the given string in the given return_type
.
Calculates the similarity between the left and right string, using Simhash.
Returns a float representing similarity between left
and right
strings.
Link to this section Functions
Returns Hamming distance between the left
and right
hash,
given as lists of bits.
examples
Examples
iex> Similarity.Simhash.hamming_distance([1, 1, 0, 1, 0], [0, 1, 1, 1, 0])
2
@spec hash(String.t(), pos_integer(), :list | :integer) :: [0 | 1]
Returns the hash for the given string in the given return_type
.
examples
Examples
Similarity.Simhash.hash("alma korte", 3)
[1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, ...]
Similarity.Simhash.hash("alma korte", 3, :list)
[1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, ...]
iex> Similarity.Simhash.hash("alma korte", 3, :integer)
15012197954348909067
@spec similarity(String.t(), String.t(), pos_integer()) :: float()
Calculates the similarity between the left and right string, using Simhash.
Returns a float representing similarity between left
and right
strings.
options
Options
:ngram_size
- defaults to 3
examples
Examples
iex> Similarity.simhash("khan academy", "khan academia")
0.890625
iex> Similarity.simhash("khan academy", "academy khan", ngram_size: 1)
1.0