Penelope v0.5.0 Penelope.NLP.POSTagger View Source

The part-of-speech tagger transforms a tokenized sentence into a list of {token, pos_tag} tuples. The tagger takes no responsibility for tokenization; this means that callers must be careful to maintain the same tokenization scheme between training and evaluating to ensure the best results.

As this tagger does not ship with a pretrained model, it is both language- and tagset-agnostic, though the default feature set used (see POSFeaturizer) was designed for English.

See POSTaggerTrainer.train/2 for an example of how to train a new POS tagger model.

Link to this section Summary

Functions

Imports parameters from a serialized model

Exports a runtime model to a serializable data structure

Fits the tagger model. Custom featurizers may be supplied

Attaches part of speech tags to a list of tokens

Link to this section Types

Link to this type model() View Source
model() :: %{pos_tagger: [{atom(), any()}]}

Link to this section Functions

Link to this function compile(params) View Source
compile(params :: map()) :: model()

Imports parameters from a serialized model.

Link to this function export(model) View Source
export(model :: model()) :: map()

Exports a runtime model to a serializable data structure.

Link to this function fit(context, x, y, featurizers \\ [pos_featurizer: []]) View Source
fit(
  context :: map(),
  x :: [tokens :: [String.t()]],
  y :: [tags :: [String.t()]],
  featurizers :: [{atom() | String.t(), [any()]}]
) :: model()

Fits the tagger model. Custom featurizers may be supplied.

Link to this function tag(model, context, tokens) View Source
tag(model :: model(), context :: map(), tokens :: [String.t()]) :: [
  {String.t(), String.t()}
]

Attaches part of speech tags to a list of tokens.

Example:

iex> POSTagger.tag(model, %{}, ["Judy", "saw", "her"])
[{"Judy", "NNP"}, {"saw", "VBD"}, {"her", "PRP$"}]