View Source Tsne (t-SNE v0.1.2)

t-distributed stochastic neighbor embedding (t-SNE) is a statistical method for visualizing high-dimensional data in two or three dimensions.

This library provides bindings to fast exact and Barnes-Hut implementations of t-SNE in Rust using the bhtsne crate.

Link to this section Summary

Link to this section Functions

Link to this function

barnes_hut(data, opts \\ [])

View Source

Barnes Hut t-SNE.

Barnes-Hut is a tree-based algorithm for accelerating t-SNE. It runs in O(NlogN) time (while exact runs in O(N^2)) time.

options

Options

  • :embedding_dimensions (integer/0) - Dimension of the embedded space. The default value is 2.

  • :learning_rate (float/0) - The learning rate for t-SNE is usually in the range [10.0, 1000.0]. If the learning rate is too high, the data may look like a ‘ball’ with any point approximately equidistant from its nearest neighbours. If the learning rate is too low, most points may look compressed in a dense cloud with few outliers. If the cost function gets stuck in a bad local minimum increasing the learning rate may help. The default value is 200.0.

  • :epochs (integer/0) - Maximum number of iterations for the optimization. Should be at least 250. The default value is 1000.

  • :perplexity (float/0) - The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. Different values can result in significantly different results. The perplexity must be less than the number of samples. The default value is 30.0.

  • :final_momentum (float/0) - The value for momentum after the initial early exaggeration phase. Seemomentum for more info. The default value is 0.8.

  • :momentum (float/0) - Gradient descent with momentum keeps a sum exponentially decaying weights from previous iterations, speeding up convergence. In early stages of the optimization, this is typically set to a lower value (0.5 in most implementations) since points generally move around quite a bit in this phase and increased after the initial early exaggeration phase (typically to 0.8, see: final_momentum) to speed up convergence. The default value is 0.5.

  • :metric - The distance metric to use. Must be either :euclidean or :cosine. The default value is :euclidean.

  • :theta (float/0) - The tradeoff parameter between accuracy (0) and speed (1). The default value is 0.5.

Exact t-SNE.

options

Options

  • :embedding_dimensions (integer/0) - Dimension of the embedded space. The default value is 2.

  • :learning_rate (float/0) - The learning rate for t-SNE is usually in the range [10.0, 1000.0]. If the learning rate is too high, the data may look like a ‘ball’ with any point approximately equidistant from its nearest neighbours. If the learning rate is too low, most points may look compressed in a dense cloud with few outliers. If the cost function gets stuck in a bad local minimum increasing the learning rate may help. The default value is 200.0.

  • :epochs (integer/0) - Maximum number of iterations for the optimization. Should be at least 250. The default value is 1000.

  • :perplexity (float/0) - The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. Different values can result in significantly different results. The perplexity must be less than the number of samples. The default value is 30.0.

  • :final_momentum (float/0) - The value for momentum after the initial early exaggeration phase. Seemomentum for more info. The default value is 0.8.

  • :momentum (float/0) - Gradient descent with momentum keeps a sum exponentially decaying weights from previous iterations, speeding up convergence. In early stages of the optimization, this is typically set to a lower value (0.5 in most implementations) since points generally move around quite a bit in this phase and increased after the initial early exaggeration phase (typically to 0.8, see: final_momentum) to speed up convergence. The default value is 0.5.

  • :metric - The distance metric to use. Must be either :euclidean or :cosine. The default value is :euclidean.

  • :theta (float/0) - The tradeoff parameter between accuracy (0) and speed (1). The default value is 0.5.