content_indexer v0.1.0 ContentIndexer.Services.PreProcess

content and query pre-process functions that are passed to the SearchUtils.compile and SearchUtils.compile_query functions - here we are just some some extra stuf with a markdown file - i.e. removing the header.

The import thing to note is that these two functions take in the content as a string and spit out a list of tokenized strings.

The steps we are taking:

(1) Remove all the stop words - they are noise and we should never search by them (2) remove non-char data & white space

Using streams means most of the work will happen in a single step

Link to this section Summary

Link to this section Functions

Link to this function pre_process_content(content, file_name)
Link to this function pre_process_query(query)