content_indexer v0.2.0 ContentIndexer.Services.Similarity
Summary This module accepts a list of tuples which contain the document id and a hash of terms and and their TF_IDF weights, it also accepts query terms in the form of a hash of terms and weights, same format as in the tuple above.
[ { 1, %{ “abc” => 0.001, “term1” => 0.123, “term2” => 0.934, “term3” => 0.945 } }, { 1, %{ “abc” => 0.001, “term1” => 0.123, “term2” => 0.934, “term3” => 0.945 } }… ]
The module will compute the similarity of all the provided documents to the query terms. It will then return an ordered set of terms and their corresponding weights
Link to this section Summary
Functions
Compares a nested list of documents representing individual index items against a set of query terms
retrives a list of filenames for the similarity_map - see the compare function
See the compare function as this one does the same just omitting the filenames
Link to this section Functions
Compares a nested list of documents representing individual index items against a set of query terms
Parameters
- document_list: List of tuples containing the file_name & a list of tokens and their respective weights in the index
- query: List of tuples containing the query term as String and it’s respective weight
Example
iex> ContentIndexer.Services.Similarity.compare(
[
{"test1.md", [{"great", 0.0066469689853797444}, {"how", 0.01994090695613923}]},
{"test2.md", [{"silent", 0.0066469689853797444}, {"instrument", 0.01994090695613923}]}
],
[
{"great", -0.6931471805599453}
])
["test1.md"]
retrives a list of filenames for the similarity_map - see the compare function
See the compare function as this one does the same just omitting the filenames