truffle_hog v0.1.0 TruffleHog
Provides a method to search for matches within a list of documents using TF-IDF.
There are two main use cases: finding which documents are the most similar within the list; finding which document is the most related to a search query.
How to use
Convert each document into a tuple where the first item is an identifier, and the second is a list of tokens. Tokenizer is not included, because you may want to write your own.
Example:
[{1, ~w(this is a a sample)},
{2, ~w(this example is another example)}]
Create an index using the function index_documents
.
index = list_documents |> TruffleHog.index_documents()
Use find_matches
to find the matches on the index.
matches = index |> TruffleHog.find_matches(["search", "items"], quantity)
Link to this section Summary
Link to this section Functions
find_matches(index, search, quantity)
Finds the best matches within the index.
index must be the return of TruffleHog.index_documents.
search is a list of tokens to search for.
quantity is the number of matches to be returned.
Returns a list of tuples, where the first item of the tuple is the identifier of the document, and the second is a factor of how similar the document is to the search. The list is sorted from most similar to least similar.
index_documents(documents)
Indexes a list of documents.
Returns a map with all the indices to make future searches.
documents is expected to be a list of pairs, the first being the id of the document, and the second a list of tokens contained in the document.
Example argument
[{1, ~w(this is a a sample)},
{2, ~w(this example is another example)}]