Penelope v0.4.0 Penelope.ML.Word2vec.Index
This module represents a word2vec-style vectorset, compiled into a set of hash-partitioned DETS files. Each record is a tuple consisting of the term (word) and a set of weights (vector). This module also supports parsing the standard text representation of word vectors via the compile function.
On disk, the following files are created:
Link to this section Summary
Functions
closes the index
inserts word vectors from a text file into a word2vec index
creates a new word2vec index
retrieves a term by its id
inserts a word vector tuple into a word2vec index
searches for a term in the word2vec index
opens an existing word2vec index at the specified path
parses and inserts a single word vector text line into a word2vec index
parses a word vector line: “
Link to this section Types
t() :: %Penelope.ML.Word2vec.Index{header: atom(), name: atom(), partitions: pos_integer(), tables: [atom()], vector_size: pos_integer(), version: pos_integer()}
Link to this section Functions
closes the index
compile!(index :: Penelope.ML.Word2vec.Index.t(), path :: String.t()) :: :ok
inserts word vectors from a text file into a word2vec index
the index must have been opened using create()
create!(path :: String.t(), name :: String.t(), [partitions: pos_integer(), size_hint: pos_integer(), vector_size: pos_integer()]) :: Penelope.ML.Word2vec.Index.t()
creates a new word2vec index
files will be created as
fetch!(index :: Penelope.ML.Word2vec.Index.t(), id :: pos_integer()) :: String.t() | nil
retrieves a term by its id
if found, returns the term string otherwise, returns nil
insert!(index :: Penelope.ML.Word2vec.Index.t(), record :: {String.t(), pos_integer(), Penelope.ML.Vector.t()}) :: :ok
inserts a word vector tuple into a word2vec index
lookup!(index :: Penelope.ML.Word2vec.Index.t(), term :: String.t()) :: {integer(), Penelope.ML.Vector.t()}
searches for a term in the word2vec index
if found, returns the id and word vector (no term) otherwise, returns nil
open!(path :: String.t(), [{:cache_size, pos_integer()}]) :: Penelope.ML.Word2vec.Index.t()
opens an existing word2vec index at the specified path
parse_insert!(index :: Penelope.ML.Word2vec.Index.t(), {line :: String.t(), id :: pos_integer()}) :: {String.t(), pos_integer(), Penelope.ML.Vector.t()}
parses and inserts a single word vector text line into a word2vec index
parse_line!(line :: String.t()) :: {String.t(), Penelope.ML.Vector.t()}
parses a word vector line: “