content_indexer v0.2.0 ContentIndexer.Indexer
Summary
Indexer is a Genserver that holds the index state - basically a list of index structs that have the filename, tokens and weights
Each time an index struct is added to the server/index the weightings are re-calculated. Since they are stored in memory the index searching is fast
Link to this section Summary
Functions
Adds a new file_name and associated list of tokens to the index
Re calculates all the term_weights on the entire index
Retrieves a list of all the tokens in the entire index
Returns a nested list of all the individual index items containing their file_name and associated tokens with weights
Initialises the Index with an empty list
Resets the index with an empty list
Retrieves the entire index
Link to this section Functions
Adds a new file_name and associated list of tokens to the index
## Parameters
- file_name: String that represents the file that has the content to be indexed
- tokens: List of Strings that are the tokenised content
## Example
iex> ContentIndexer.Indexer.add("test_file.md", ["bread", "butter", "jam", "mustard"])
{:ok,
[%ContentIndexer.Index{file_name: "test_file.md",
term_weights: [{"bread", -0.17328679513998632},
{"butter", -0.17328679513998632}, {"jam", -0.17328679513998632},
{"mustard", -0.17328679513998632}],
tokens: ["bread", "butter", "jam", "mustard"],
uuid: "18693629-bfa9-4ffc-8fe8-ebc0c5c72c7b"}]}
Re calculates all the term_weights on the entire index
## Example
iex>ContentIndexer.Indexer.calculate()
{:ok,
[%ContentIndexer.Index{file_name: "test_file_3.md",
term_weights: [{"orange", 0.0}, {"fruit", 0.0}, {"basket", 0.0},
{"apples", 0.0}], tokens: ["orange", "fruit", "basket", "apples"],
uuid: "2c600089-b35d-4667-a146-4635bd282811"},
%ContentIndexer.Index{file_name: "test_file_2.md",
term_weights: [{"orange", 0.0}, {"fruit", 0.0}, {"basket", 0.0},
{"apples", 0.0}], tokens: ["orange", "fruit", "basket", "apples"],
uuid: "c62c65be-4ac6-46bc-9597-2d70c65fa1a0"},
%ContentIndexer.Index{file_name: "test_file.md",
term_weights: [{"bread", 0.1013662770270411}, {"butter", 0.1013662770270411},
{"jam", 0.1013662770270411}, {"mustard", 0.1013662770270411}],
tokens: ["bread", "butter", "jam", "mustard"],
uuid: "18693629-bfa9-4ffc-8fe8-ebc0c5c72c7b"}]}
Retrieves a list of all the tokens in the entire index
## Example
iex> ContentIndexer.Indexer.corpus_of_tokens
{:ok,
[["orange", "fruit", "basket", "apples"],
["bread", "butter", "jam", "mustard"]]}
Returns a nested list of all the individual index items containing their file_name and associated tokens with weights
## Example
iex> ContentIndexer.Indexer.documents()
{:ok,
[{"test_file_3.md",
[{"orange", 0.0}, {"fruit", 0.0}, {"basket", 0.0}, {"apples", 0.0}]},
{"test_file_2.md",
[{"orange", 0.0}, {"fruit", 0.0}, {"basket", 0.0}, {"apples", 0.0}]},
{"test_file.md",
[{"bread", 0.1013662770270411}, {"butter", 0.1013662770270411},
{"jam", 0.1013662770270411}, {"mustard", 0.1013662770270411}]}]}
Initialises the Index with an empty list
Resets the index with an empty list
Retrieves the entire index