Penelope v0.4.0 Penelope.ML.CRF.Tagger
The CRF tagger is a thin wrapper over the CRFSuite library for sequence inference. It provides the ability to train sequence models, use them for inference, and import/export them.
Features (Xs) are represented as lists of sequences (lists). Each sequence entry can contain a string (for simple word-based features), a list of stringable values (list features), or maps (for named features per sequence item).
Labels (Ys) are represented as lists of sequences of strings. Each label must correspond to an entry in the feature lists.
Models are compiled/exported to/from a map containing a binary blob that is maintained by CRF suite. Training parameters are analogs of those used by the sklearn-crfsuite library. For more information, see: http://www.chokkan.org/software/crfsuite/ https://sklearn-crfsuite.readthedocs.io/en/latest/
Link to this section Summary
Functions
compiles a pre-trained model
extracts model parameters from compiled model
trains a CRF model and returns it as a compiled model
predicts a list of target sequences from a list of feature sequences returns the predicted sequences and their probability
Link to this section Functions
compiles a pre-trained model
extracts model parameters from compiled model
These parameters are simple elixir objects and can later be passed to
compile
to prepare the model for inference.
trains a CRF model and returns it as a compiled model
options:
|key |default |
|—————————————|——————————|
|algorithm
|:lbfgs
|
|min_freq
|0.0 |
|all_possible_states
|false |
|all_possible_transitions
|false |
|c1
|0.0 |
|c2
|0.0 |
|max_iterations
|depends on algorithm|
|num_memories
|6 |
|epsilon
|1e-5 |
|period
|10 |
|delta
|1e-5 |
|linesearch
|:more_thuente |
|max_linesearch
|20 |
|calibration_eta
|0.1 |
|calibration_rate
|2.0 |
|calibration_samples
|1000 |
|calibration_candidates
|10 |
|calibration_max_trials
|20 |
|pa_type
|1 |
|c
|1.0 |
|error_sensitive
|true |
|averaging
|true |
|variance
|1.0 |
|gamma
|1.0 |
algorithms:
:lbfgs
, :l2sgd
, :ap
, :pa
, :arow
linesearch:
:more_thuente
, :backtracking
, :strong_backtracking
for more information on parameters, see https://sklearn-crfsuite.readthedocs.io/en/latest/api.html
predicts a list of target sequences from a list of feature sequences returns the predicted sequences and their probability
transform(model :: map(), context :: map(), x :: [[String.t() | list() | map()]]) :: [[map()]]