View Source Similarity.Simhash (Similarity v0.4.0)

Simhash string similarity algorithm. Description of Simhash

iex> Similarity.simhash("Barna", "Kovacs")
0.59375

iex> Similarity.simhash("Austria", "Australia")
0.65625

Link to this section Summary

Functions

Returns Hamming distance between the left and right hash, given as lists of bits.

Returns the hash for the given string and hash_function in the given return_type.

Calculates the similarity between the left and right string, using Simhash. Returns a float representing similarity between left and right strings.

Link to this section Functions

Link to this function

hamming_distance(left, right, acc \\ 0)

View Source

Returns Hamming distance between the left and right hash, given as lists of bits.

examples

Examples

iex> Similarity.Simhash.hamming_distance([1, 1, 0, 1, 0], [0, 1, 1, 1, 0])
2
@spec hash(
  String.t(),
  keyword()
) :: [0 | 1] | integer()

Returns the hash for the given string and hash_function in the given return_type.

options

Options

  • :ngram_size - defaults to 3
  • :hash_function - defaults to :siphash, available options are :siphash, :md5, :sha256
  • :return_type - defaults to :list, available options are :list, :int64_unsigned, :int64_signed, :binary

The return types :int64_unsigned and :int64_signed are only available for the :siphash hash function.

examples

Examples

Similarity.Simhash.hash("alma korte")
[1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, ...]

iex> Similarity.Simhash.hash("alma korte", ngram_size: 3, hash_function: :siphash, return_type: :int64_unsigned)
15012197954348909067

iex> Similarity.Simhash.hash("alma korte", ngram_size: 3, hash_function: :siphash, return_type: :int64_signed)
-3434546119360642549
Link to this function

hash_similarity(left, right, length)

View Source
Link to this function

similarity(left, right, options \\ [])

View Source
@spec similarity(String.t(), String.t(), pos_integer()) :: float()

Calculates the similarity between the left and right string, using Simhash. Returns a float representing similarity between left and right strings.

options

Options

  • :ngram_size - defaults to 3

examples

Examples

iex> Similarity.simhash("khan academy", "khan academia")
0.890625

iex> Similarity.simhash("khan academy", "academy khan", ngram_size: 1)
1.0