content_indexer v0.1.0 API Reference

Modules

Documentation for ContentIndexer

Summary

calculates the content_indexer weights for a document of tokens against a corpus of tokenized documents

struct to store the details of what data is held in the index

Summary

Indexer is a Genserver that holds the index state - basically a list of index structs that have the filename, tokens and weights
Each time an index struct is added to the server/index the weightings are re-calculated. Since they are stored in memory the index searching is fast

Summary

ListCheckerServer is the OTP server that uses Genserver to handle the
interactions with the individual workers and the parent caller
The ListCheckerWorkers each process a list of tokens
and checks that list for a given token. Once it is done a message is
returned to the ListCheckerServer.
The server in turn sends a message to the callee - advising it once the whole
list of token lists has been checked successfully!

genserver based approach to the ListCheckerWorker Summary

ListCheckerWorker is the OTP actor that handles the actual ContentIndexerService.list_contains to check
whether a given word is contained in a list of tokens

content and query pre-process functions that are passed to the SearchUtils.compile and SearchUtils.compile_query functions - here we are just some some extra stuf with a markdown file - i.e. removing the header

utility functions to crawl a folder with files and extract content - the actual processing of the content is handled by the file_pre_process_func function that we are using from the ContentIndexer.Services.PreProcess module - however this can easily be swapped out by passing your own pre-process

Summary This module accepts a list of tuples which contain the document id and a hash of terms and and their TF_IDF weights, it also accepts query terms in the form of a hash of terms and weights, same format as in the tuple above