content_indexer v0.2.0 ContentIndexer.Services.PreProcess
content and query pre-process functions that are passed to the SearchUtils.compile and SearchUtils.compile_query functions - here we are just some some extra stuf with a markdown file - i.e. removing the header.
The import thing to note is that these two functions take in the content as a string and spit out a list of tokenized strings.
The steps we are taking:
(1) Remove all the stop words - they are noise and we should never search by them (2) remove non-char data & white space
Using streams means most of the work will happen in a single step
Link to this section Summary
Functions
Processes the String based content of a file
Processes a set of query tokens - removing all non characters, stop words and empty space
Link to this section Functions
Processes the String based content of a file
## Parameters
- content: String based file content
- file_name: String - the file name
## Example
iex> ContentIndexer.Services.PreProcess.pre_process_content(["this is just some random file content", "test_file_one.txt")
["just", "some", "random", "file", "content"]
Processes a set of query tokens - removing all non characters, stop words and empty space
## Parameters
- query: List of String based query tokens
## Example
iex> ContentIndexer.Services.PreProcess.pre_process_query(["this", "is", "just", "meaningless"])
["just", "meaningless"]