Similarity v0.2.0 Similarity.Simhash View Source

Simhash string similarity algorithm. Description of Simhash

iex> Similarity.simhash("Barna", "Kovacs")
0.59375

iex> Similarity.simhash("Austria", "Australia")
0.65625

Link to this section Summary

Functions

Returns Hamming distance between the left and right hash, given as lists of bits.

Returns the hash for the given string.

Calculates the similarity between the left and right string, using Simhash. Returns a float representing similarity between left and right strings.

Link to this section Functions

Link to this function

hamming_distance(left, right, acc \\ 0) View Source

Returns Hamming distance between the left and right hash, given as lists of bits.

Examples

iex> Similarity.Simhash.hamming_distance([1, 1, 0, 1, 0], [0, 1, 1, 1, 0])
2
Link to this function

hash(string, n) View Source
hash(String.t(), pos_integer()) :: [0 | 1]

Returns the hash for the given string.

Examples

Similarity.Simhash.hash("alma korte", 3)
[1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, ...]
Link to this function

similarity(left, right, options \\ []) View Source
similarity(String.t(), String.t(), pos_integer()) :: float()

Calculates the similarity between the left and right string, using Simhash. Returns a float representing similarity between left and right strings.

Options

  • :ngram_size - defaults to 3

Examples

iex> Similarity.simhash("khan academy", "khan academia")
0.890625

iex> Similarity.simhash("khan academy", "academy khan", ngram_size: 1)
1.0