content_indexer v0.2.0 ContentIndexer.Services.Similarity

Summary This module accepts a list of tuples which contain the document id and a hash of terms and and their TF_IDF weights, it also accepts query terms in the form of a hash of terms and weights, same format as in the tuple above.

[ { 1, %{ “abc” => 0.001, “term1” => 0.123, “term2” => 0.934, “term3” => 0.945 } }, { 1, %{ “abc” => 0.001, “term1” => 0.123, “term2” => 0.934, “term3” => 0.945 } }… ]

The module will compute the similarity of all the provided documents to the query terms. It will then return an ordered set of terms and their corresponding weights

Link to this section Summary

Functions

Compares a nested list of documents representing individual index items against a set of query terms

retrives a list of filenames for the similarity_map - see the compare function

See the compare function as this one does the same just omitting the filenames

Link to this section Functions

Link to this function compare(document_list, query_terms)

Compares a nested list of documents representing individual index items against a set of query terms

Parameters

  • document_list: List of tuples containing the file_name & a list of tokens and their respective weights in the index
  • query: List of tuples containing the query term as String and it’s respective weight

Example

iex> ContentIndexer.Services.Similarity.compare(

  [
    {"test1.md", [{"great", 0.0066469689853797444}, {"how", 0.01994090695613923}]},
    {"test2.md", [{"silent", 0.0066469689853797444}, {"instrument", 0.01994090695613923}]}
  ],
  [
    {"great", -0.6931471805599453}
  ])
  ["test1.md"]
Link to this function get_filenames(similarity_map)

retrives a list of filenames for the similarity_map - see the compare function

Link to this function get_similarity(document_list, query_terms)

See the compare function as this one does the same just omitting the filenames