bayesic v0.1.0 Bayesic
A data matcher that uses Bayes’ Theorem to calculate the probability of a given match. This is similar to Naive Bayes, but it is optimized for cases where you have many possible classifications, with a relatively small amount of data per class.
Matching Words
iex> matcher = Bayesic.Trainer.new()
...> |> Bayesic.train(["once","upon","a","time"], "story")
...> |> Bayesic.train(["tonight","on","the","news"], "news")
...> |> Bayesic.finalize()
iex> Bayesic.classify(matcher, ["once","upon"])
%{"story" => 1.0}
iex> Bayesic.classify(matcher, ["tonight"])
%{"news" => 1.0}
Matching Trigrams
iex> tri = fn(str) -> str |> String.codepoints |> Enum.chunk_every(3, 1, :discard) |> Enum.map(&(Enum.join(&1,""))) end
iex> tri.("teeth")
["tee","eet","eth"]
iex> matcher = Bayesic.Trainer.new()
...> |> Bayesic.train(tri.("triassic"), "old")
...> |> Bayesic.train(tri.("jurassic"), "old")
...> |> Bayesic.train(tri.("modern"), "new")
...> |> Bayesic.train(tri.("hipster"), "new")
...> |> Bayesic.finalize()
iex> Bayesic.classify(matcher, tri.("moder"))
%{"new" => 1.0}
iex> Bayesic.classify(matcher, tri.("jrassic"))
%{"old" => 1.0}
Link to this section Summary
Functions
Take a list of tokens and provide a map of which classifications it might match along with a propbability of each classification
After you have loaded up your trainer with example data, this function will run
some calculations and turn it into a %Bayesic.Matcher{}
.
We also do some data pruning at this stage to remove tokens that appear frequently.
You can customize how much pruning you want to do by passing in the :pruning_threshold option.
Tokens that appear in more than the :pruning_percentage of classifications will be removed.
This can speed things up quite a bit and it usually doesn’t hur your accuracy we are already
weighting the tokens by how uniqe they are (see Bayes Theorem)
Feed some example data to your trainer
Link to this section Functions
Take a list of tokens and provide a map of which classifications it might match along with a propbability of each classification..
Examples
iex> matcher = Bayesic.Trainer.new()
...> |> Bayesic.train(["once","upon","a","time"], "story")
...> |> Bayesic.train(["tonight","on","the","news"], "news")
...> |> Bayesic.finalize()
iex> Bayesic.classify(matcher, ["once","upon"])
%{"story" => 1.0}
iex> Bayesic.classify(matcher, ["tonight"])
%{"news" => 1.0}
After you have loaded up your trainer with example data, this function will run
some calculations and turn it into a %Bayesic.Matcher{}
.
We also do some data pruning at this stage to remove tokens that appear frequently.
You can customize how much pruning you want to do by passing in the :pruning_threshold option.
Tokens that appear in more than the :pruning_percentage of classifications will be removed.
This can speed things up quite a bit and it usually doesn’t hur your accuracy we are already
weighting the tokens by how uniqe they are (see Bayes Theorem).
iex> Bayesic.Trainer.new()
...> |> Bayesic.train([1, 2, 3], "small numbers")
...> |> Bayesic.finalize(pruning_threshold: 0.1)
#Bayesic.Matcher<>
Feed some example data to your trainer.
The classification can be an arbitrary term. You can put maps, strings, ecto structs etc.
The tokens should be a list of items you saw in the original data.
For example if you are trying to match user input to a list of movie titles you might
break up the movie titles into words ("Jurassic Park" => ["jurassic", "park"]
).
Later when the user is typing in a name you can take the string the user has typed and break
it into the tokens the same way to check for a high confidence match.
Examples
iex> Bayesic.Trainer.new() |> Bayesic.train(["once","upon","a","time"], "story")
#Bayesic.Trainer<>