truffle_hog v0.1.0 TruffleHog

Provides a method to search for matches within a list of documents using TF-IDF.

There are two main use cases: finding which documents are the most similar within the list; finding which document is the most related to a search query.

How to use

Convert each document into a tuple where the first item is an identifier, and the second is a list of tokens. Tokenizer is not included, because you may want to write your own.

Example:

[{1, ~w(this is a a sample)},
 {2, ~w(this example is another example)}]

Create an index using the function index_documents.

index = list_documents |> TruffleHog.index_documents()

Use find_matches to find the matches on the index.

matches = index |> TruffleHog.find_matches(["search", "items"], quantity)

Link to this section Summary

Functions

Finds the best matches within the index.

Indexes a list of documents.

Link to this section Functions

Link to this function

find_matches(index, search, quantity)

Finds the best matches within the index.

index must be the return of TruffleHog.index_documents.

search is a list of tokens to search for.

quantity is the number of matches to be returned.

Returns a list of tuples, where the first item of the tuple is the identifier of the document, and the second is a factor of how similar the document is to the search. The list is sorted from most similar to least similar.

Link to this function

index_documents(documents)

Indexes a list of documents.

Returns a map with all the indices to make future searches.

documents is expected to be a list of pairs, the first being the id of the document, and the second a list of tokens contained in the document.

Example argument

[{1, ~w(this is a a sample)},
 {2, ~w(this example is another example)}]