content_indexer v0.2.0 ContentIndexer.Indexer

Summary

Indexer is a Genserver that holds the index state - basically a list of index structs that have the filename, tokens and weights
Each time an index struct is added to the server/index the weightings are re-calculated. Since they are stored in memory the index searching is fast

Link to this section Summary

Functions

Adds a new file_name and associated list of tokens to the index

Re calculates all the term_weights on the entire index

Retrieves a list of all the tokens in the entire index

Returns a nested list of all the individual index items containing their file_name and associated tokens with weights

Initialises the Index with an empty list

Resets the index with an empty list

Retrieves the entire index

Link to this section Functions

Link to this function add(file_name, tokens)

Adds a new file_name and associated list of tokens to the index

## Parameters

- file_name: String that represents the file that has the content to be indexed
- tokens: List of Strings that are the tokenised content

## Example

iex> ContentIndexer.Indexer.add("test_file.md", ["bread", "butter", "jam", "mustard"])
{:ok,
  [%ContentIndexer.Index{file_name: "test_file.md",
    term_weights: [{"bread", -0.17328679513998632},
      {"butter", -0.17328679513998632}, {"jam", -0.17328679513998632},
      {"mustard", -0.17328679513998632}],
    tokens: ["bread", "butter", "jam", "mustard"],
    uuid: "18693629-bfa9-4ffc-8fe8-ebc0c5c72c7b"}]}

Re calculates all the term_weights on the entire index

## Example

iex>ContentIndexer.Indexer.calculate()
{:ok,
  [%ContentIndexer.Index{file_name: "test_file_3.md",
    term_weights: [{"orange", 0.0}, {"fruit", 0.0}, {"basket", 0.0},
      {"apples", 0.0}], tokens: ["orange", "fruit", "basket", "apples"],
    uuid: "2c600089-b35d-4667-a146-4635bd282811"},
    %ContentIndexer.Index{file_name: "test_file_2.md",
    term_weights: [{"orange", 0.0}, {"fruit", 0.0}, {"basket", 0.0},
      {"apples", 0.0}], tokens: ["orange", "fruit", "basket", "apples"],
    uuid: "c62c65be-4ac6-46bc-9597-2d70c65fa1a0"},
    %ContentIndexer.Index{file_name: "test_file.md",
    term_weights: [{"bread", 0.1013662770270411}, {"butter", 0.1013662770270411},
      {"jam", 0.1013662770270411}, {"mustard", 0.1013662770270411}],
    tokens: ["bread", "butter", "jam", "mustard"],
    uuid: "18693629-bfa9-4ffc-8fe8-ebc0c5c72c7b"}]}
Link to this function corpus_of_tokens()

Retrieves a list of all the tokens in the entire index

## Example

iex> ContentIndexer.Indexer.corpus_of_tokens
{:ok,
  [["orange", "fruit", "basket", "apples"],
    ["bread", "butter", "jam", "mustard"]]}

Returns a nested list of all the individual index items containing their file_name and associated tokens with weights

## Example

iex> ContentIndexer.Indexer.documents()
{:ok,
[{"test_file_3.md",
  [{"orange", 0.0}, {"fruit", 0.0}, {"basket", 0.0}, {"apples", 0.0}]},
  {"test_file_2.md",
  [{"orange", 0.0}, {"fruit", 0.0}, {"basket", 0.0}, {"apples", 0.0}]},
  {"test_file.md",
  [{"bread", 0.1013662770270411}, {"butter", 0.1013662770270411},
    {"jam", 0.1013662770270411}, {"mustard", 0.1013662770270411}]}]}

Initialises the Index with an empty list

Resets the index with an empty list

Link to this function retrieve_index()

Retrieves the entire index