Penelope v0.3.0 Penelope.ML.CRF.Tagger

The CRF tagger is a thin wrapper over the CRFSuite library for sequence inference. It provides the ability to train sequence models, use them for inference, and import/export them.

Features (Xs) are represented as lists of sequences (lists). Each sequence entry can contain a string (for simple word-based features), a list of stringable values (list features), or maps (for named features per sequence item).

Labels (Ys) are represented as lists of sequences of strings. Each label must correspond to an entry in the feature lists.

Models are compiled/exported to/from a map containing a binary blob that is maintained by CRF suite. Training parameters are analogs of those used by the sklearn-crfsuite library. For more information, see: http://www.chokkan.org/software/crfsuite/ https://sklearn-crfsuite.readthedocs.io/en/latest/

Link to this section Summary

Functions

compiles a pre-trained model

extracts model parameters from compiled model

trains a CRF model and returns it as a compiled model

predicts a list of target sequences from a list of feature sequences returns the predicted sequences and their probability

Link to this section Functions

Link to this function compile(params)
compile(params :: map()) :: map()

compiles a pre-trained model

Link to this function export(map)
export(%{crf: reference()}) :: map()

extracts model parameters from compiled model

These parameters are simple elixir objects and can later be passed to compile to prepare the model for inference.

Link to this function fit(context, x, y, options \\ [])
fit(context :: map(), x :: [[String.t() | list() | map()]], y :: [[String.t()]], options :: keyword()) :: map()

trains a CRF model and returns it as a compiled model

options: |key |default | |—————————————|——————————| |algorithm |:lbfgs | |min_freq |0.0 | |all_possible_states |false | |all_possible_transitions|false | |c1 |0.0 | |c2 |0.0 | |max_iterations |depends on algorithm| |num_memories |6 | |epsilon |1e-5 | |period |10 | |delta |1e-5 | |linesearch |:more_thuente | |max_linesearch |20 | |calibration_eta |0.1 | |calibration_rate |2.0 | |calibration_samples |1000 | |calibration_candidates |10 | |calibration_max_trials |20 | |pa_type |1 | |c |1.0 | |error_sensitive |true | |averaging |true | |variance |1.0 | |gamma |1.0 |

algorithms: :lbfgs, :l2sgd, :ap, :pa, :arow

linesearch: :more_thuente, :backtracking, :strong_backtracking

for more information on parameters, see https://sklearn-crfsuite.readthedocs.io/en/latest/api.html

Link to this function predict_sequence(model, context, x)
predict_sequence(%{crf: reference()}, context :: map(), x :: [[String.t() | list() | map()]]) :: [{[String.t()], float()}]

predicts a list of target sequences from a list of feature sequences returns the predicted sequences and their probability

Link to this function transform(model, context, x)
transform(model :: map(), context :: map(), x :: [[String.t() | list() | map()]]) :: [[map()]]