ExTextSplitter.Native (ex_text_splitter v0.1.0)

ExTextSplitter

This package provides bindings to text-splitter crate By default only the text_splitter function is available but you can configure the available features:

# this will enable all features
config :ex_text_splitter,
  features: ["markdown", "tiktoken-rs"]

This can be also configured using Mix.installed

Mix.install(
    [:ex_text_splitter],
    config: [ex_text_splitter: [features: ["markdown"]]]
)

Installation

If available in Hex, the package can be installed by adding ex_text_splitter to your list of dependencies in mix.exs:

def deps do
  [
    {:ex_text_splitter, "~> 0.1.0"}
  ]
end

Documentation can be generated with ExDoc and published on HexDocs. Once published, the docs can be found at https://hexdocs.pm/ex_text_splitter.

Summary

Functions

text_splitter("your text here", options) where options is a keyword list with these optional params

Functions

Link to this function

markdown_splitter(arg1, arg2)

Link to this function

text_splitter(arg1, arg2)

text_splitter("your text here", options) where options is a keyword list with these optional params:

max_tokens: integer, min_tokens: integer, trim_chunks: bool, tokenizer: "cl100k_base" | "p50k_base" | "r50k_base" | "p50k_edit"

using text_splitter a token is a single letter, when using tokenizer_text_splitter then it's a real token

Link to this function

tokenizer_markdown_splitter(arg1, arg2)

Link to this function

tokenizer_text_splitter(arg1, arg2)