ExTextSplitter.Native (ex_text_splitter v0.1.0)
ExTextSplitter
This package provides bindings to text-splitter crate By default only the text_splitter function is available but you can configure the available features:
# this will enable all features
config :ex_text_splitter,
features: ["markdown", "tiktoken-rs"]
This can be also configured using Mix.installed
Mix.install(
[:ex_text_splitter],
config: [ex_text_splitter: [features: ["markdown"]]]
)
Installation
If available in Hex, the package can be installed
by adding ex_text_splitter
to your list of dependencies in mix.exs
:
def deps do
[
{:ex_text_splitter, "~> 0.1.0"}
]
end
Documentation can be generated with ExDoc and published on HexDocs. Once published, the docs can be found at https://hexdocs.pm/ex_text_splitter.
Summary
Functions
text_splitter("your text here", options) where options is a keyword list with these optional params
Functions
markdown_splitter(arg1, arg2)
text_splitter(arg1, arg2)
text_splitter("your text here", options) where options is a keyword list with these optional params:
max_tokens: integer, min_tokens: integer, trim_chunks: bool, tokenizer: "cl100k_base" | "p50k_base" | "r50k_base" | "p50k_edit"
using text_splitter a token is a single letter, when using tokenizer_text_splitter then it's a real token