Penelope v0.5.0 Penelope.ML.CRF.Tagger View Source

The CRF tagger is a thin wrapper over the CRFSuite library for sequence inference. It provides the ability to train sequence models, use them for inference, and import/export them.

Features (Xs) are represented as lists of sequences (lists). Each sequence entry can contain a string (for simple word-based features), a list of stringable values (list features), or maps (for named features per sequence item).

Labels (Ys) are represented as lists of sequences of strings. Each label must correspond to an entry in the feature lists.

Models are compiled/exported to/from a map containing a binary blob that is maintained by CRF suite. Training parameters are analogs of those used by the sklearn-crfsuite library. For more information, see: http://www.chokkan.org/software/crfsuite/ https://sklearn-crfsuite.readthedocs.io/en/latest/

Link to this section Summary

Functions

compiles a pre-trained model

extracts model parameters from compiled model

trains a CRF model and returns it as a compiled model

predicts a list of target sequences from a list of feature sequences returns the predicted sequences and their probability

Link to this section Functions

Link to this function compile(params) View Source
compile(params :: map()) :: map()

compiles a pre-trained model

Link to this function export(map) View Source
export(%{crf: reference()}) :: map()

extracts model parameters from compiled model

These parameters are simple elixir objects and can later be passed to compile to prepare the model for inference.

Link to this function fit(context, x, y, options \\ []) View Source
fit(
  context :: map(),
  x :: [[String.t() | list() | map()]],
  y :: [[String.t()]],
  options :: keyword()
) :: map()

trains a CRF model and returns it as a compiled model

options: |key |default | |—————————————|——————————| |algorithm |:lbfgs | |min_freq |0.0 | |all_possible_states |false | |all_possible_transitions|false | |c1 |0.0 | |c2 |0.0 | |max_iterations |depends on algorithm| |num_memories |6 | |epsilon |1e-5 | |period |10 | |delta |1e-5 | |linesearch |:more_thuente | |max_linesearch |20 | |calibration_eta |0.1 | |calibration_rate |2.0 | |calibration_samples |1000 | |calibration_candidates |10 | |calibration_max_trials |20 | |pa_type |1 | |c |1.0 | |error_sensitive |true | |averaging |true | |variance |1.0 | |gamma |1.0 | |verbose |false |

algorithms: :lbfgs, :l2sgd, :ap, :pa, :arow

linesearch: :more_thuente, :backtracking, :strong_backtracking

for more information on parameters, see https://sklearn-crfsuite.readthedocs.io/en/latest/api.html

Link to this function predict_sequence(model, context, x) View Source
predict_sequence(
  %{crf: reference()},
  context :: map(),
  x :: [[String.t() | list() | map()]]
) :: [{[String.t()], float()}]

predicts a list of target sequences from a list of feature sequences returns the predicted sequences and their probability

Link to this function transform(model, context, x) View Source
transform(
  model :: map(),
  context :: map(),
  x :: [[String.t() | list() | map()]]
) :: [[map()]]