content_indexer v0.2.0 API Reference
Modules
Documentation for ContentIndexer
struct to store the details of what data is held in the index
It provides a new/2
function for instantiating the struct that includes a generated UUID
Summary
Indexer is a Genserver that holds the index state - basically a list of index structs that have the filename, tokens and weights
Each time an index struct is added to the server/index the weightings are re-calculated. Since they are stored in memory the index searching is fast
Summary
calculates the content_indexer weights for a document of tokens against a corpus of tokenized documents
Summary
ListCheckerServer is the OTP server that uses Genserver to handle the
interactions with the individual workers and the parent caller
The ListCheckerWorkers each process a list of tokens
and checks that list for a given token. Once it is done a message is
returned to the ListCheckerServer.
The server in turn sends a message to the callee - advising it once the whole
list of token lists has been checked successfully!
genserver based approach to the ListCheckerWorker Summary
ListCheckerWorker is the OTP actor that handles the actual ContentIndexerService.list_contains to check
whether a given word is contained in a list of tokens
content and query pre-process functions that are passed to the SearchUtils.compile and SearchUtils.compile_query functions - here we are just some some extra stuf with a markdown file - i.e. removing the header
utility functions to crawl a folder with files and extract content - the actual processing of the content is handled by the file_pre_process_func function that we are using from the ContentIndexer.Services.PreProcess module - however this can easily be swapped out by passing your own pre-process
Summary This module accepts a list of tuples which contain the document id and a hash of terms and and their TF_IDF weights, it also accepts query terms in the form of a hash of terms and weights, same format as in the tuple above