View Source Scholar.Preprocessing (Scholar v0.2.1)

Set of functions for preprocessing data.

Summary

Functions

Converts a tensor into binary values based on the given threshold.

Scales a tensor by dividing each sample in batch by maximum absolute value in the batch

Transform a tensor by scaling each batch to the given range.

Normalize samples individually to unit norm.

Encode labels as a one-hot numeric tensor.

Encodes a tensor's values into integers from range 0 to :num_classes - 1.

Standardizes the tensor by removing the mean and scaling to unit variance.

Functions

Link to this function

binarize(tensor, opts \\ [])

View Source

Converts a tensor into binary values based on the given threshold.

Options

  • :type - Type of the resultant tensor. The default value is :f32.

  • :threshold - Feature values below or equal to this are replaced by 0, above it by 1. The default value is 0.

Examples

iex> Scholar.Preprocessing.binarize(Nx.tensor([[1.0, -1.0, 2.0], [2.0, 0.0, 0.0], [0.0, 1.0, -1.0]]))
#Nx.Tensor<
  f32[3][3]
  [
    [1.0, 0.0, 1.0],
    [1.0, 0.0, 0.0],
    [0.0, 1.0, 0.0]
  ]
>

iex> Scholar.Preprocessing.binarize(Nx.tensor([[1.0, -1.0, 2.0], [2.0, 0.0, 0.0], [0.0, 1.0, -1.0]]), threshold: 1.3, type: {:u, 8})
#Nx.Tensor<
  u8[3][3]
  [
    [0, 0, 1],
    [1, 0, 0],
    [0, 0, 0]
  ]
>
Link to this function

max_abs_scale(tensor, opts \\ [])

View Source

Scales a tensor by dividing each sample in batch by maximum absolute value in the batch

Options

  • :axes - Axes to calculate the distance over. By default the distance is calculated between the whole tensors.

Examples

iex> Scholar.Preprocessing.max_abs_scale(Nx.tensor([1, 2, 3]))
#Nx.Tensor<
  f32[3]
  [0.3333333432674408, 0.6666666865348816, 1.0]
>

iex> Scholar.Preprocessing.max_abs_scale(Nx.tensor([[1, -1, 2], [3, 0, 0], [0, 1, -1], [2, 3, 1]]), axes: [0])
#Nx.Tensor<
  f32[4][3]
  [
    [0.3333333432674408, -0.3333333432674408, 1.0],
    [1.0, 0.0, 0.0],
    [0.0, 0.3333333432674408, -0.5],
    [0.6666666865348816, 1.0, 0.5]
  ]
>

iex> Scholar.Preprocessing.max_abs_scale(42)
#Nx.Tensor<
  f32
  1.0
>
Link to this function

min_max_scale(tensor, opts \\ [])

View Source

Transform a tensor by scaling each batch to the given range.

Options

  • :axes - Axes to calculate the distance over. By default the distance is calculated between the whole tensors.

  • :min - The lower boundary of the desired range of transformed data. The default value is 0.

  • :max - The upper boundary of the desired range of transformed data. The default value is 1.

Examples

iex> Scholar.Preprocessing.min_max_scale(Nx.tensor([1, 2, 3]))
#Nx.Tensor<
  f32[3]
  [0.0, 0.5, 1.0]
>

iex> Scholar.Preprocessing.min_max_scale(Nx.tensor([[1, -1, 2], [3, 0, 0], [0, 1, -1], [2, 3, 1]]), axes: [0])
#Nx.Tensor<
  f32[4][3]
  [
    [0.3333333432674408, 0.0, 1.0],
    [1.0, 0.25, 0.3333333432674408],
    [0.0, 0.5, 0.0],
    [0.6666666865348816, 1.0, 0.6666666865348816]
  ]
>

iex> Scholar.Preprocessing.min_max_scale(Nx.tensor([[1, -1, 2], [3, 0, 0], [0, 1, -1], [2, 3, 1]]), axes: [0], min: 1, max: 3)
#Nx.Tensor<
  f32[4][3]
  [
    [1.6666667461395264, 1.0, 3.0],
    [3.0, 1.5, 1.6666667461395264],
    [1.0, 2.0, 1.0],
    [2.3333334922790527, 3.0, 2.3333334922790527]
  ]
>

iex> Scholar.Preprocessing.min_max_scale(42)
#Nx.Tensor<
  f32
  0.0
>
Link to this function

normalize(tensor, opts \\ [])

View Source

Normalize samples individually to unit norm.

The zero-tensors cannot be normalized and they stay the same after normalization.

Options

  • :axes - Axes to calculate the distance over. By default the distance is calculated between the whole tensors.

  • :norm - The norm to use to normalize each non zero sample. Possible options are :euclidean, :manhattan, and :chebyshev The default value is :euclidean.

Examples

iex> Scholar.Preprocessing.normalize(Nx.tensor([[0, 0, 0], [3, 4, 5], [-2, 4, 3]]), axes: [1])
#Nx.Tensor<
  f32[3][3]
  [
    [0.0, 0.0, 0.0],
    [0.4242640733718872, 0.5656854510307312, 0.7071067690849304],
    [-0.3713906705379486, 0.7427813410758972, 0.5570860505104065]
  ]
>

iex> Scholar.Preprocessing.normalize(Nx.tensor([[0, 0, 0], [3, 4, 5], [-2, 4, 3]]))
#Nx.Tensor<
  f32[3][3]
  [
    [0.0, 0.0, 0.0],
    [0.3375263810157776, 0.4500351846218109, 0.5625439882278442],
    [-0.22501759231090546, 0.4500351846218109, 0.3375263810157776]
  ]
>
Link to this function

one_hot_encode(tensor, opts \\ [])

View Source

Encode labels as a one-hot numeric tensor.

Labels must be integers from 0 to :num_classes - 1. If the data does not meet the condition, please use ordinal_encoding/2 first.

Options

  • :num_classes (pos_integer/0) - Required. Number of classes to be encoded.

Examples

iex> Scholar.Preprocessing.one_hot_encode(Nx.tensor([2, 0, 3, 2, 1, 1, 0]), num_classes: 4)
#Nx.Tensor<
  u8[7][4]
  [
    [0, 0, 1, 0],
    [1, 0, 0, 0],
    [0, 0, 0, 1],
    [0, 0, 1, 0],
    [0, 1, 0, 0],
    [0, 1, 0, 0],
    [1, 0, 0, 0]
  ]
>
Link to this function

ordinal_encode(tensor, opts \\ [])

View Source

Encodes a tensor's values into integers from range 0 to :num_classes - 1.

Options

  • :num_classes (pos_integer/0) - Required. Number of classes to be encoded.

Examples

iex> Scholar.Preprocessing.ordinal_encode(Nx.tensor([3, 2, 4, 56, 2, 4, 2]), num_classes: 4)
#Nx.Tensor<
  s64[7]
  [1, 0, 2, 3, 0, 2, 0]
>
Link to this function

standard_scale(tensor, opts \\ [])

View Source

Standardizes the tensor by removing the mean and scaling to unit variance.

Formula for input tensor $x$: $$ z = \frac{x - \mu}{\sigma} $$ Where $\mu$ is the mean of the samples, and $\sigma$ is the standard deviation. Standardization can be helpful in cases where the data follows a Gaussian distribution (or Normal distribution) without outliers.

Options

  • :axes - Axes to calculate the distance over. By default the distance is calculated between the whole tensors.

Examples

iex> Scholar.Preprocessing.standard_scale(Nx.tensor([1,2,3]))
#Nx.Tensor<
  f32[3]
  [-1.2247447967529297, 0.0, 1.2247447967529297]
>

iex> Scholar.Preprocessing.standard_scale(Nx.tensor([[1, -1, 2], [2, 0, 0], [0, 1, -1]]))
#Nx.Tensor<
  f32[3][3]
  [
    [0.5212860703468323, -1.3553436994552612, 1.4596009254455566],
    [1.4596009254455566, -0.4170288145542145, -0.4170288145542145],
    [-0.4170288145542145, 0.5212860703468323, -1.3553436994552612]
  ]
>

iex> Scholar.Preprocessing.standard_scale(Nx.tensor([[1, -1, 2], [2, 0, 0], [0, 1, -1]]), axes: [1])
#Nx.Tensor<
  f32[3][3]
  [
    [0.26726120710372925, -1.3363062143325806, 1.069044828414917],
    [1.4142135381698608, -0.7071068286895752, -0.7071068286895752],
    [0.0, 1.2247447967529297, -1.2247447967529297]
  ]
>

iex> Scholar.Preprocessing.standard_scale(42)
#Nx.Tensor<
  f32
  42.0
>