Penelope v0.2.3 Penelope.ML.Word2vec.Index

This module represents a word2vec-style vectorset, compiled into a set of hash-partitioned DETS files. Each record is a tuple consisting of the term (word) and a set of weights (vector). This module also supports parsing the standard text representation of word vectors via the compile function.

On disk, the following files are created: /header.dets index header (version, metadata) /_.dets partition file

Link to this section Summary

Functions

closes the index

inserts word vectors from a text file into a word2vec index

creates a new word2vec index

retrieves a term by its id

inserts a word vector tuple into a word2vec index

searches for a term in the word2vec index

opens an existing word2vec index at the specified path

parses and inserts a single word vector text line into a word2vec index

parses a word vector line: “ …”

Link to this section Types

Link to this type t()
t() :: %Penelope.ML.Word2vec.Index{header: atom(), name: atom(), partitions: pos_integer(), tables: [atom()], vector_size: pos_integer(), version: pos_integer()}

Link to this section Functions

Link to this function close(index)
close(index :: Penelope.ML.Word2vec.Index.t()) :: :ok

closes the index

Link to this function compile!(index, path)
compile!(index :: Penelope.ML.Word2vec.Index.t(), path :: String.t()) :: :ok

inserts word vectors from a text file into a word2vec index

the index must have been opened using create()

Link to this function create!(path, name, options \\ [])
create!(path :: String.t(), name :: String.t(), [partitions: pos_integer(), size_hint: pos_integer(), vector_size: pos_integer()]) :: Penelope.ML.Word2vec.Index.t()

creates a new word2vec index

files will be created as /_.dets, one per partition

Link to this function fetch!(index, id)
fetch!(index :: Penelope.ML.Word2vec.Index.t(), id :: pos_integer()) ::
  String.t() |
  nil

retrieves a term by its id

if found, returns the term string otherwise, returns nil

Link to this function insert!(index, record)
insert!(index :: Penelope.ML.Word2vec.Index.t(), record :: {String.t(), pos_integer(), Penelope.ML.Vector.t()}) :: :ok

inserts a word vector tuple into a word2vec index

Link to this function lookup!(index, term)
lookup!(index :: Penelope.ML.Word2vec.Index.t(), term :: String.t()) :: {integer(), Penelope.ML.Vector.t()}

searches for a term in the word2vec index

if found, returns the id and word vector (no term) otherwise, returns nil

Link to this function open!(path, options \\ [])
open!(path :: String.t(), [{:cache_size, pos_integer()}]) :: Penelope.ML.Word2vec.Index.t()

opens an existing word2vec index at the specified path

Link to this function parse_insert!(index, arg)
parse_insert!(index :: Penelope.ML.Word2vec.Index.t(), {line :: String.t(), id :: pos_integer()}) :: {String.t(), pos_integer(), Penelope.ML.Vector.t()}

parses and inserts a single word vector text line into a word2vec index

Link to this function parse_line!(line)
parse_line!(line :: String.t()) :: {String.t(), Penelope.ML.Vector.t()}

parses a word vector line: “ …”