Nilsimsa (nilsimsa v1.0.0) View Source

Nilsimsa is an implementation of a locality-sensitive hashing algorithm where similar input values produce similar hashes. The more similar the input strings are, the smaller the bitwise different between the out generated hashes.

Nilsimsa hashes are useful for detecting texts of the same origin.

Processing a string

To process a string, pass the value to the process/1 function:

Nilsimsa.process("abcdefgh")

You can also process a stream:

"war_and_peace.txt"
|> File.stream!()
|> Enum.reduce(Nilsimsa.process(""), &Nilsimsa.process/2)

Generating a digest

To generate a digest of the Nilsimsa hash, just pass the process struct to the to_string/1 function:

to_string(Nilsimsa.process("abcdefgh"))
# => 14c8118000000000030800000004042004189020001308014088003280000078

Comparing values

To compare two values, use the compare/2 function:

Nilsimsa.compare(Nilsimsa.process("hello world"), Nilsimsa.process("all of your base"))
# => 3

Link to this section Summary

Functions

Compare two hashed values

Generate the digest of a hash

Process the given string as a Nilsimsa hash

Process the given string as a Nilsimsa hash using the given accumulator struct

Link to this section Types

Specs

t() :: %Nilsimsa{
  acc: [integer()],
  count: integer(),
  digest: [integer()] | nil,
  threshold: float(),
  window: [integer()]
}

Link to this section Functions

Specs

compare(t(), t()) :: integer()

Compare two hashed values

This returns a value between -127 and 128 where -127 is different and 128 is similar.

Examples

iex> Nilsimsa.compare(Nilsimsa.process("abc"), Nilsimsa.process("def"))
126

Specs

digest(t()) :: t()

Generate the digest of a hash

Examples

iex> to_string(Nilsimsa.digest(Nilsimsa.process("abcdefgh")))
"14c8118000000000030800000004042004189020001308014088003280000078"

Specs

process(String.t()) :: t()

Process the given string as a Nilsimsa hash

Examples

iex> to_string(Nilsimsa.process("abcdefghijklmnopqrstuvwxyz"))
"94ca95850773045cabb93869ba8657373499beb81a17587fd6f9107fc54cc978"

Specs

process(String.t(), t()) :: t()

Process the given string as a Nilsimsa hash using the given accumulator struct

Examples

iex> to_string(Nilsimsa.process("abcdefghijklmnopqrstuvwxyz", %Nilsimsa{}))
"94ca95850773045cabb93869ba8657373499beb81a17587fd6f9107fc54cc978"