View Source Similarity.Simhash (Similarity v0.2.4)

Simhash string similarity algorithm. Description of Simhash

iex> Similarity.simhash("Barna", "Kovacs")
0.59375

iex> Similarity.simhash("Austria", "Australia")
0.65625

Link to this section Summary

Functions

Returns Hamming distance between the left and right hash, given as lists of bits.

Returns the hash for the given string in the given return_type.

Calculates the similarity between the left and right string, using Simhash. Returns a float representing similarity between left and right strings.

Link to this section Functions

Link to this function

hamming_distance(left, right, acc \\ 0)

View Source

Returns Hamming distance between the left and right hash, given as lists of bits.

examples

Examples

iex> Similarity.Simhash.hamming_distance([1, 1, 0, 1, 0], [0, 1, 1, 1, 0])
2
Link to this function

hash(string, ngram_size, return_type \\ :list)

View Source
@spec hash(String.t(), pos_integer(), :list | :integer) :: [0 | 1]

Returns the hash for the given string in the given return_type.

examples

Examples

Similarity.Simhash.hash("alma korte", 3)
[1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, ...]

Similarity.Simhash.hash("alma korte", 3, :list)
[1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, ...]

iex> Similarity.Simhash.hash("alma korte", 3, :integer)
15012197954348909067
Link to this function

similarity(left, right, options \\ [])

View Source
@spec similarity(String.t(), String.t(), pos_integer()) :: float()

Calculates the similarity between the left and right string, using Simhash. Returns a float representing similarity between left and right strings.

options

Options

  • :ngram_size - defaults to 3

examples

Examples

iex> Similarity.simhash("khan academy", "khan academia")
0.890625

iex> Similarity.simhash("khan academy", "academy khan", ngram_size: 1)
1.0