View Source Sufx

This library provides simple fuzzy matching of strings given a search pattern. Patterns are a suite of characters without any special meaning, simply put, a string.

This resembles VS Code fuzzy finder, with a very basic scoring algorithm.

The implemenation is based on a suffix tree, though this library is not a generic suffix tree or suffix trie implementation.

Example

A tree is defined by a map of tokens, where each key is a string and each value any term of your chosing. Here for instance we are building a tree to help find general domain topics. The values are composed of a domain (programming, physics, …) and the topic name. The search will be performed on the topic names so we use them for keys.

tree =
  %{
    "algorithm" => {:programming, "algorithm"},
    "wavelength" => {:physics, "wavelength"},
    "allegory" => {:philosophy, "allegory"},
    "novel" => {:litterature, "novel"}
  }
  |> Sufx.new()
  |> Sufx.compress()

Now the tree is ready to use, we can search, for instance with the "alg" string.

results = Sufx.search(tree, "alg")
IO.inspect(results, label: "results")

The code above would print the following results:

results: [
  physics: "wavelength",
  philosophy: "allegory",
  programming: "algorithm"
]

The matches were computed like so:

  • wavelength
  • allegory
  • algorithm

The library supports a simplistic scoring mechanism based on the length of the matched patterns. With the following code:

results =
  tree
  |> Sufx.search_score("alg")
  |> Sufx.sort_by_score()

IO.inspect(results, label: "results")

We would get the following results:

results: [
  {{:programming, "algorithm"}, 3},
  {{:philosophy, "allegory"}, 2},
  {{:physics, "wavelength"}, 1}
]

When building the tree, the key does not have to be part of the value. For instance to match user posts in a database, where the user has posts like these:

documents =
  [
    %{id: 1001, title: "Collectible card game are great!"},
    %{id: 1002, title: "Discussion on suffixes"},
    %{id: 1003, title: "Cats for the greater good"},
    %{id: 1004, title: "Cats considered harmul!"}
  ]

A tree could be built and searched like so:

tree =
  documents
  |> Enum.reduce(Sufx.new(), fn post, tree ->
    Sufx.insert(tree, post.title, post.id)
  end)
  |> Sufx.compress()

results = Sufx.search_score(tree, "cgg")

As we did not include the searchable strings in our values, but just the post IDs, this is what we expect:

results: [{1001, 1}, {1003, 1}]

And Sufx.search_score(tree, "CCG") would yield [{1001, 1}].

Installation

If available in Hex, the package can be installed by adding sufx to your list of dependencies in mix.exs:

def deps do
  [
    {:sufx, "~> 0.1.0"},
  ]
end

Documentation can be generated with ExDoc and published on HexDocs. Once published, the docs can be found at https://hexdocs.pm/sufx.