content_indexer v0.2.0 ContentIndexer.Services.PreProcess

content and query pre-process functions that are passed to the SearchUtils.compile and SearchUtils.compile_query functions - here we are just some some extra stuf with a markdown file - i.e. removing the header.

The import thing to note is that these two functions take in the content as a string and spit out a list of tokenized strings.

The steps we are taking:

(1) Remove all the stop words - they are noise and we should never search by them (2) remove non-char data & white space

Using streams means most of the work will happen in a single step

Link to this section Summary

Functions

Processes the String based content of a file

Processes a set of query tokens - removing all non characters, stop words and empty space

Link to this section Functions

Link to this function pre_process_content(content, file_name)

Processes the String based content of a file

## Parameters

- content: String based file content
- file_name: String - the file name

## Example

iex> ContentIndexer.Services.PreProcess.pre_process_content(["this is just some random file content", "test_file_one.txt")
      ["just", "some", "random", "file", "content"]
Link to this function pre_process_query(query)

Processes a set of query tokens - removing all non characters, stop words and empty space

## Parameters

- query: List of String based query tokens

## Example

iex> ContentIndexer.Services.PreProcess.pre_process_query(["this", "is", "just", "meaningless"])
      ["just", "meaningless"]