View Source Sufx
This library provides simple fuzzy matching of strings given a search pattern. Patterns are a suite of characters without any special meaning, simply put, a string.
This resembles VS Code fuzzy finder, with a very basic scoring algorithm.
The implemenation is based on a suffix tree, though this library is not a generic suffix tree or suffix trie implementation.
Example
A tree is defined by a map of tokens, where each key is a string and each value any term of your chosing. Here for instance we are building a tree to help find general domain topics. The values are composed of a domain (programming, physics, …) and the topic name. The search will be performed on the topic names so we use them for keys.
tree =
%{
"algorithm" => {:programming, "algorithm"},
"wavelength" => {:physics, "wavelength"},
"allegory" => {:philosophy, "allegory"},
"novel" => {:litterature, "novel"}
}
|> Sufx.new()
|> Sufx.compress()
Now the tree is ready to use, we can search, for instance with the "alg"
string.
results = Sufx.search(tree, "alg")
IO.inspect(results, label: "results")
The code above would print the following results:
results: [
physics: "wavelength",
philosophy: "allegory",
programming: "algorithm"
]
The matches were computed like so:
- wavelength
- allegory
- algorithm
The library supports a simplistic scoring mechanism based on the length of the matched patterns. With the following code:
results =
tree
|> Sufx.search_score("alg")
|> Sufx.sort_by_score()
IO.inspect(results, label: "results")
We would get the following results:
results: [
{{:programming, "algorithm"}, 3},
{{:philosophy, "allegory"}, 2},
{{:physics, "wavelength"}, 1}
]
When building the tree, the key does not have to be part of the value. For instance to match user posts in a database, where the user has posts like these:
documents =
[
%{id: 1001, title: "Collectible card game are great!"},
%{id: 1002, title: "Discussion on suffixes"},
%{id: 1003, title: "Cats for the greater good"},
%{id: 1004, title: "Cats considered harmul!"}
]
A tree could be built and searched like so:
tree =
documents
|> Enum.reduce(Sufx.new(), fn post, tree ->
Sufx.insert(tree, post.title, post.id)
end)
|> Sufx.compress()
results = Sufx.search_score(tree, "cgg")
As we did not include the searchable strings in our values, but just the post IDs, this is what we expect:
results: [{1001, 1}, {1003, 1}]
And Sufx.search_score(tree, "CCG")
would yield [{1001, 1}]
.
Installation
If available in Hex, the package can be installed
by adding sufx
to your list of dependencies in mix.exs
:
def deps do
[
{:sufx, "~> 0.1.0"},
]
end
Documentation can be generated with ExDoc and published on HexDocs. Once published, the docs can be found at https://hexdocs.pm/sufx.