Penelope v0.5.0 Penelope.ML.Text.POSFeaturizer View Source

The POS featurizer converts a list of lists of tokens into nested lists containing feature maps relevant to POS tagging for each token.

Features used for the POS tagger are largely inspired by A Maximum Entropy Model for Part-Of-Speech Tagging; the following is an example feature map for an individual token:

token_list = ["it", "is", "a", little-known", "fact"]
token = "little-known"
%{
  "has_hyphen" => true,
  "has_digit" => false,
  "has_cap" => false,
  "pre_1" => "l",
  "pre_2" => "li",
  "pre_3" => "lit",
  "pre_4" => "litt",
  "suff_1" => "n",
  "suff_2" => "wn",
  "suff_3" => "own",
  "suff_4" => "nown",
  "tok_-2" => "is",
  "tok_-1" => "a",
  "tok_0" => "little-known",
  "tok_1" => "fact",
  "tok_2" => "",
}

Link to this section Summary

Functions

transforms the token lists into lists of feature maps

Link to this section Functions

Link to this function transform(model, context, x) View Source
transform(model :: map(), context :: map(), x :: [[String.t()]]) :: [map()]

transforms the token lists into lists of feature maps.