content_indexer v0.1.0 API Reference
Modules
Documentation for ContentIndexer
Summary
calculates the content_indexer weights for a document of tokens against a corpus of tokenized documents
struct to store the details of what data is held in the index
Summary
Indexer is a Genserver that holds the index state - basically a list of index structs that have the filename, tokens and weights
Each time an index struct is added to the server/index the weightings are re-calculated. Since they are stored in memory the index searching is fast
Summary
ListCheckerServer is the OTP server that uses Genserver to handle the
interactions with the individual workers and the parent caller
The ListCheckerWorkers each process a list of tokens
and checks that list for a given token. Once it is done a message is
returned to the ListCheckerServer.
The server in turn sends a message to the callee - advising it once the whole
list of token lists has been checked successfully!
genserver based approach to the ListCheckerWorker Summary
ListCheckerWorker is the OTP actor that handles the actual ContentIndexerService.list_contains to check
whether a given word is contained in a list of tokens
content and query pre-process functions that are passed to the SearchUtils.compile and SearchUtils.compile_query functions - here we are just some some extra stuf with a markdown file - i.e. removing the header
utility functions to crawl a folder with files and extract content - the actual processing of the content is handled by the file_pre_process_func function that we are using from the ContentIndexer.Services.PreProcess module - however this can easily be swapped out by passing your own pre-process
Summary This module accepts a list of tuples which contain the document id and a hash of terms and and their TF_IDF weights, it also accepts query terms in the form of a hash of terms and weights, same format as in the tuple above