Markov (markov v2.2.0)

Public API

Example workflow:

# the model is to be stored under /base/directory/model_name
# the model will be created using specified options if not found
{:ok, model} = Markov.load("/base/directory", "model_name", sanitize_tokens: true, store_history: [:train])

# train using four strings
{:ok, _} = Markov.train(model, "hello, world!")
{:ok, _} = Markov.train(model, "example string number two")
{:ok, _} = Markov.train(model, "hello, Elixir!")
{:ok, _} = Markov.train(model, "fourth string")

# generate text
{:ok, text} = Markov.generate_text(model)
IO.inspect(text)

# unload model from RAM
Markov.unload(model)

# these will return errors because the model is unloaded
# Markov.generate_text(model)
# Markov.train(model, "hello, world!")

# load the model again
{:ok, model} = Markov.load("/base/directory", "model_name")

# enable probability shifting and generate text
:ok = Markov.configure(model, shift_probabilities: true)
{:ok, text} = Markov.generate_text(model)
IO.inspect(text)

# print uninteresting stats
model |> Markov.dump_partition(0) |> IO.inspect
model |> Markov.read_log |> IO.inspect

# this will also write our new just-set option
Markov.unload(model)

Link to this section Summary

Types

Model options that could be set during creation in a call to load/3 or with configure/2

If data was tagged when training, you can use tag queries to only select generation paths that match a set of criteria

Functions

Reconfigures an already loaded model. See model_option/0 for a thorough description of the options

Reads an entire partition for debugging purposes

Predicts (generates) a string. Will raise an exception if the model was trained on non-textual tokens at least once

Predicts (generates) a list of tokens

Gets the configuration of an already loaded model

Loads an existing model from base_dir/name. If none is found, a new model with the specified options at that path will be created and loaded, and if that fails, an error will be returned

Deletes model data. There's no going back :)

Reads the log file and returns a list of entries in chronological order

Trains model using text or a list of tokens.

Unloads an already loaded model

Link to this section Types

Link to this type

log_entry_type()

@type log_entry_type() ::
  :train | :train_deferred | :repart_start | :repart_done | :start | :end | :gen
Link to this type

model_option()

@type model_option() ::
  {:store_history, [log_entry_type()]}
  | {:shift_probabilities, boolean()}
  | {:partition_size, integer()}
  | {:partition_timeout, integer()}
  | {:sanitize_tokens, boolean()}
  | {:order, integer()}

Model options that could be set during creation in a call to load/3 or with configure/2:

  • store_history: determines what data to put in the operation log, all of them by default:
    • :train: training requests
    • :train_deferred: training requests that have been deferred to until after repartitioning is complete
    • :gen: generation results
    • :repart_start - repartition start
    • :repart_done - repartition done
    • :start - model is loaded
    • :end - model is unloaded
  • shift_probabilities: gives less popular generation paths more chance to get used, which makes the output more original but may produce nonsense; false by default
  • partition_size: approximate number of link entries in one partition, 10k by default
  • partition_timeout: partition is unloaded from RAM after that many milliseconds of inactivity, 10k by default
  • sanitize_tokens: ignores letter case and punctuation when switching states, but still keeps the output as-is; false by default, can't be changed once the model is created
  • order: order of the chain, i.e. how many previous tokens the next one is based on; 2 by default, can never be changed once the model is created
Link to this opaque

model_reference()

(opaque)
@opaque model_reference()
@type tag_query() ::
  true
  | {tag_query(), :or, tag_query()}
  | {tag_query(), :score, [{tag_query(), integer()}]}
  | {:not, tag_query()}
  | term()

If data was tagged when training, you can use tag queries to only select generation paths that match a set of criteria

  • true always matches
  • {x, :or, y} matches when either x or y matches
  • {:not, x} matches if x doesn't match, and vice versa
  • {x, :score, y} is only allowed at the top level; the total score counter (initially 0) is increased by score for every element {query, score} of y (a list) that matches; probabilities are then adjusted according to those scores.
  • any other term is treated as a tag (note the :"$none" tag - the default one)

examples

Examples:

# training
iex> Markov.train(model, "hello earth", [
  {:action, :saying_hello}, # <- terms of any type can function as tags
  {:subject_type, :planet},
  {:subject, "earth"},
  :lowercase
])
{:ok, :done}
iex> Markov.train(model, "Hello Elixir", [
  {:action, :saying_hello},
  {:subject_type, :programming_language},
  {:subject, "Elixir"},
  :uppercase
])
{:ok, :done}


# simple generation - both paths have equal probabilities
iex> Markov.generate_text(model)
{:ok, "hello earth"}
iex> Markov.generate_text(model)
{:ok, "hello Elixir"}

# simple tag queries
iex> Markov.generate_text(model, {:subject_type, :planet})
{:ok, "hello earth"}
iex> Markov.generate_text(model, :lowercase)
{:ok, "hello earth"}
iex> Markov.generate_text(model, {:subject_type, :programming_language})
{:ok, "hello Elixir"}
iex> Markov.generate_text(model, :uppercase)
{:ok, "hello Elixir"}

# both possible generation paths were tagged with this tag
iex> Markov.generate_text(model, {:action, :saying_hello})
{:ok, "hello earth"}
iex> Markov.generate_text(model, {:action, :saying_hello})
{:ok, "hello Elixir"}

# both paths match, but "hello Elixir" has a score of 1 and "hello earth"
# has a score of zero; thus, "hello Elixir" has a probability of 2/3, and
# "hello earth" has that of 1/3
iex> Markov.generate_text(model, {true, :score, [:uppercase]})
{:ok, "hello Elixir"}
iex> Markov.generate_text(model, {true, :score, [:uppercase]})
{:ok, "hello earth"}

Link to this section Functions

Link to this function

configure(model, opts)

@spec configure(model :: model_reference(), opts :: [model_option()]) ::
  :ok | {:error, term()}

Reconfigures an already loaded model. See model_option/0 for a thorough description of the options

Link to this function

dump_partition(model, part_no)

@spec dump_partition(model_reference(), integer()) :: [
  {{term(), term()}, %{required(term()) => integer()}}
]

Reads an entire partition for debugging purposes

Link to this function

generate_text(model, tag_query \\ true)

@spec generate_text(model_reference(), tag_query()) ::
  {:ok, binary()} | {:error, term()}

Predicts (generates) a string. Will raise an exception if the model was trained on non-textual tokens at least once

iex> Markov.generate_text(model)
{:ok, "hello world"}

See type tag_query/0 for more info about tags

Link to this function

generate_tokens(model, tag_query \\ true)

@spec generate_tokens(model_reference(), tag_query()) ::
  {:ok, [term()]} | {:error, term()}

Predicts (generates) a list of tokens

iex> Markov.generate_tokens(model)
{:ok, ["hello", "world"]}

See type tag_query/0 for more info about tags

Link to this function

get_config(model)

@spec get_config(model :: model_reference()) ::
  {:ok, [model_option()]} | {:error, term()}

Gets the configuration of an already loaded model

Link to this function

load(base_dir, name, create_options \\ [])

@spec load(base_dir :: String.t(), name :: String.t(), options :: [model_option()]) ::
  {:ok, model_reference()} | {:error, term()}

Loads an existing model from base_dir/name. If none is found, a new model with the specified options at that path will be created and loaded, and if that fails, an error will be returned

@spec nuke(model :: model_reference()) :: :ok

Deletes model data. There's no going back :)

Link to this function

read_log(model)

@spec read_log(model_reference()) ::
  {:ok, [{DateTime.t(), log_entry_type(), term()}]} | {:error, term()}

Reads the log file and returns a list of entries in chronological order

iex> Markov.read_log(model)
{:ok,
 [
   {~U[2022-10-02 16:59:51.844Z], :start, nil},
   {~U[2022-10-02 16:59:56.705Z], :train, ["hello", "world"]}
 ]}
Link to this function

train(model, text, tags \\ [:"$none"])

@spec train(model_reference(), String.t() | [term()], [term()]) ::
  {:ok, :done | :deferred} | {:error, term()}

Trains model using text or a list of tokens.

{:ok, _} = Markov.train(model, "Hello, world!")
{:ok, _} = Markov.train(model, "this is a string that's broken down into tokens behind the scenes")
{:ok, _} = Markov.train(model, [
  :this, "is", 'a token', :list, "where",
  {:each_element, :is, {:taken, :as_is}},
  :and, :can_be, :erlang.make_ref(), "<-- any term"
])

Returns the status of the operation:

  • :done - training is complete
  • :deferred - a repartition is currently in progress, this request has been placed in the backlog to be fulfilled after repartitioning is complete

See generate_text/2 for more info about specifiers

@spec unload(model :: model_reference()) :: :ok

Unloads an already loaded model