Penelope v0.1.1 API Reference

Modules

The SVM classifier uses libsvm for multi-class classification. It provides support for training a model, compiling/extracting model parameters to/from erlang data structures, and predicting classes or probabilities

This is a the vector library used by the ML modules. It provides an interface to an efficient binary representation of 32-bit floating point values. Math is done via the BLAS interface, wrapped in a NIF module

This module represents a word2vec-style vectorset, compiled into a set of hash-partitioned DETS files. Each record is a tuple consisting of the term (word) and a set of weights (vector). This module also supports parsing the standard text representation of word vectors via the compile function

This module vectorizes a list of tokens using word vectors. Token vectors are retrieved from the word2vec index (see index.ex). These are combined into a single document vector by taking their vector mean

NIF wrapper module

The tokenization scheme used for the creation of the Penn Treebank corpus. See ftp://ftp.cis.upenn.edu/pub/treebank/public_html/tokenization.html

The behaviour implemented by all tokenizers

Exceptions

DETS index processing error

Mix Tasks

This task compiles a word vector text file into a set of DETS indexes