Penelope v0.5.0 Penelope.ML.Word2vec.Index View Source

This module represents a word2vec-style vectorset, compiled into a set of hash-partitioned DETS files. Each record is a tuple consisting of the term (word) and a set of weights (vector). This module also supports parsing the standard text representation of word vectors via the compile function.

On disk, the following files are created: /header.dets index header (version, metadata) /_.dets partition file

Link to this section Summary

Functions

closes the index

inserts word vectors from a text file into a word2vec index

creates a new word2vec index

retrieves a term by its id

inserts a word vector tuple into a word2vec index

searches for a term in the word2vec index

opens an existing word2vec index at the specified path

parses and inserts a single word vector text line into a word2vec index

parses a word vector line: “ …”

Link to this section Types

Link to this type t() View Source
t() :: %Penelope.ML.Word2vec.Index{
  header: atom(),
  name: atom(),
  partitions: pos_integer(),
  tables: [atom()],
  vector_size: pos_integer(),
  version: pos_integer()
}

Link to this section Functions

closes the index

Link to this function compile!(index, path) View Source
compile!(index :: Penelope.ML.Word2vec.Index.t(), path :: String.t()) :: :ok

inserts word vectors from a text file into a word2vec index

the index must have been opened using create()

Link to this function create!(path, name, options \\ []) View Source
create!(path :: String.t(), name :: String.t(),
  partitions: pos_integer(),
  size_hint: pos_integer(),
  vector_size: pos_integer()
) :: Penelope.ML.Word2vec.Index.t()

creates a new word2vec index

files will be created as /_.dets, one per partition

Link to this function fetch!(index, id) View Source
fetch!(index :: Penelope.ML.Word2vec.Index.t(), id :: pos_integer()) ::
  String.t() | nil

retrieves a term by its id

if found, returns the term string otherwise, returns nil

Link to this function insert!(index, record) View Source
insert!(
  index :: Penelope.ML.Word2vec.Index.t(),
  record :: {String.t(), pos_integer(), Penelope.ML.Vector.t()}
) :: :ok

inserts a word vector tuple into a word2vec index

searches for a term in the word2vec index

if found, returns the id and word vector (no term) otherwise, returns nil

Link to this function open!(path, options \\ []) View Source
open!(path :: String.t(), [{:cache_size, pos_integer()}]) ::
  Penelope.ML.Word2vec.Index.t()

opens an existing word2vec index at the specified path

Link to this function parse_insert!(index, arg) View Source
parse_insert!(
  index :: Penelope.ML.Word2vec.Index.t(),
  {line :: String.t(), id :: pos_integer()}
) :: {String.t(), pos_integer(), Penelope.ML.Vector.t()}

parses and inserts a single word vector text line into a word2vec index

Link to this function parse_line!(line) View Source
parse_line!(line :: String.t()) :: {String.t(), Penelope.ML.Vector.t()}

parses a word vector line: “ …”