SpiritFingers v0.3.0 SpiritFingers.SimHash View Source

SimHash Module which delegates to Rust NIFs which will perform the hashing, similarity and distance calculations.

Link to this section Summary

Types

64 bit floating point represenation of the Hamming Distance between 2 SimHash.t.

Similarity between two SimHash.t, represented as a value between 0.0 and 1.0.

t()

unsigned 64 bit integer represenation of simhash

Functions

Bitwise hamming distance of two SimHash.t hashes

Calculate similarity as SimHash.similarity of two hashes. 0.0 means no similarity, 1.0 means identical.

Calculate SimHash.t split by whitespace.

Calculate similarity SimHash.similarity of two string slices split by whitespace by simhash.

Link to this section Types

Link to this type

distance()

View Source
distance() :: float()

64 bit floating point represenation of the Hamming Distance between 2 SimHash.t.

Link to this type

similarity()

View Source
similarity() :: float()

Similarity between two SimHash.t, represented as a value between 0.0 and 1.0.

  • 0.0 means no similarity,
  • 1.0 means identical.

unsigned 64 bit integer represenation of simhash

Link to this section Functions

Link to this function

hamming_distance(hash0, hash1)

View Source
hamming_distance(t(), t()) :: {:ok, distance()}

Bitwise hamming distance of two SimHash.t hashes

Examples

iex> SpiritFingers.SimHash.hamming_distance(0, 0)
{:ok, 0.0}

iex> SpiritFingers.SimHash.hamming_distance(0b1111111, 0b0000000)
{:ok, 7.0}

iex> SpiritFingers.SimHash.hamming_distance(0b0100101, 0b1100110)
{:ok, 3.0}
Link to this function

hash_similarity(hash0, hash1)

View Source
hash_similarity(t(), t()) :: {:ok, similarity()}

Calculate similarity as SimHash.similarity of two hashes. 0.0 means no similarity, 1.0 means identical.

Examples

iex> SpiritFingers.SimHash.hash_similarity(0, 0)
{:ok, 1.0}

iex> SpiritFingers.SimHash.hash_similarity(0xFFFFFFFFFFFFFFFF, 0)
{:ok, 0.0}

iex> SpiritFingers.SimHash.hash_similarity(0xFFFFFFFF, 0)
{:ok, 0.5}
Link to this function

simhash(bin)

View Source
simhash(binary()) :: {:ok, t()}

Calculate SimHash.t split by whitespace.

Examples

iex> SpiritFingers.SimHash.simhash("The cat sat on the mat")
{:ok, 2595200813813010837}

iex> SpiritFingers.SimHash.simhash("The cat sat under the mat")
{:ok, 2595269945604666783}

iex> SpiritFingers.SimHash.simhash("Why the lucky stiff")
{:ok, 1155526875459215761}
Link to this function

similarity(text0, text1)

View Source
similarity(binary(), binary()) :: {:ok, similarity()}

Calculate similarity SimHash.similarity of two string slices split by whitespace by simhash.

Examples

iex> SpiritFingers.SimHash.similarity("Stop hammertime", "Stop hammertime")
{:ok, 1.0}

iex> SpiritFingers.SimHash.similarity("Hocus pocus", "Hocus pocus pilatus pas")
{:ok, 0.9375}

iex> SpiritFingers.SimHash.similarity("Peanut butter", "Strawberry cocktail")
{:ok, 0.59375}