Penelope v0.3.0 Penelope.ML.Pipeline

The ML pipeline provides the ability to express an inference graph as a data structure, and to fit/export/compile/predict based on the graph. A pipeline is represented as a sequence of stages, each of which is a component module that supports the pipeline interface. This structure is modeled after sklearn’s pipeline.

A pipeline component is either a transformer (supports the transform function) or a predictor (supports one or more predict functions). Components may optionally support the fit/export/compile functions. Below is a spec for each:

fit(context::map, x::[any], y::[any], options::keyword) :: any
transform(model::any, context::map, x::[any]) :: [any]
predict_class(model::any, context::map, x::[any]) :: [any]
predict_probability(model::any, context::map, x::[any]) :: [%{any => float}]
predict_sequence(model::any, context::map, x::[any]) :: [{[any], float}]
export(model::any) :: map
compile(params::map) :: any

fit is used to train the model component and return its compiled model. transform transforms an incoming list of samples (feature matrix or list of sequences) for further pipeline processing. The predict functions output classes or sequences for a list of samples. export is used to serialize a model for persistance, and compile deserializes an exported model for inference.

Compiled models are generally maps, but they can be any data structure. The context parameter is any user-supplied value, which an be used to thread a per-inference runtime parameter into a model (see the context_featurizer component for an example).

Some components may not need custom fit/compile/export logic. For these components, the pipeline automatically compiles the fit options as a map.

The pipeline uses the registry module for component name resolution. Names may be aliases or module atoms.

The following is an example of a simple classification pipeline. It uses the token count vectorizer to count the total number of tokens in each sample string as a feature value.

pipeline = [
  {"ptb_tokenizer", []},
  {"count_vectorizer", []},
  {"svm_classifier", [kernel: :rbf, c: 2.0]}
]
x_train = [
  "big daddy bear",
  "momma bear",
  "baby",
  "big bear daddy",
  "your momma",
  "lilbear"
]
y_train = ["c", "b", "a", "c", "b", "a"]

Penelope.ML.Pipeline.fit(%{}, x_train, y_train, pipeline)

Link to this section Summary

Functions

calls a function on a module if it is supported, with a default fallback

imports parameters from a serialized model

exports a runtime model to a serializable data structure

transforms and fits each stage of the pipeline

class probability prediction

performs a sequence-to-sequence inference, returning the output sequences and sequence probabilities for each sample

transforms a list of samples through the pipeline

Link to this section Functions

Link to this function call_maybe(module, function, args, default)
call_maybe(module :: atom(), function :: atom(), args :: [any()], default :: function()) :: any()

calls a function on a module if it is supported, with a default fallback

Link to this function compile(params)
compile(params :: [map()]) :: [{atom(), any()}]

imports parameters from a serialized model

Link to this function export(model)
export(model :: [{atom(), any()}]) :: [map()]

exports a runtime model to a serializable data structure

Link to this function fit(context, x, y, stages)
fit(context :: map(), x :: [any()], y :: [any()], stages :: [{String.t() | atom(), any()}]) :: [{atom(), any()}]

transforms and fits each stage of the pipeline

A stage is a tuple of where name is a registered name or module atom, and options are the parameters to the component’s fit function.

Link to this function predict_class(model, context, x)
predict_class(model :: [{atom(), any()}], context :: map(), x :: [any()]) :: [any()]

class prediction

This function predicts a list of classes (in the model) for each sample.

Link to this function predict_probability(model, context, x)
predict_probability(model :: [{atom(), any()}], context :: map(), x :: [any()]) :: [%{optional(any()) => float()}]

class probability prediction

This function predicts the probability of each class (in a map) for each sample.

Link to this function predict_sequence(model, context, x)
predict_sequence(model :: [{atom(), any()}], context :: map(), x :: [[any()]]) :: [{[any()], float()}]

performs a sequence-to-sequence inference, returning the output sequences and sequence probabilities for each sample

Link to this function transform(model, context, x)
transform(model :: [{atom(), any()}], context :: map(), x :: [any()]) :: [any()]

transforms a list of samples through the pipeline