View Source Scholar.Preprocessing (Scholar v0.2.1)
Set of functions for preprocessing data.
Summary
Functions
Converts a tensor into binary values based on the given threshold.
Scales a tensor by dividing each sample in batch by maximum absolute value in the batch
Transform a tensor by scaling each batch to the given range.
Normalize samples individually to unit norm.
Encode labels as a one-hot numeric tensor.
Encodes a tensor's values into integers from range 0 to :num_classes - 1
.
Standardizes the tensor by removing the mean and scaling to unit variance.
Functions
Converts a tensor into binary values based on the given threshold.
Options
:type
- Type of the resultant tensor. The default value is:f32
.:threshold
- Feature values below or equal to this are replaced by 0, above it by 1. The default value is0
.
Examples
iex> Scholar.Preprocessing.binarize(Nx.tensor([[1.0, -1.0, 2.0], [2.0, 0.0, 0.0], [0.0, 1.0, -1.0]]))
#Nx.Tensor<
f32[3][3]
[
[1.0, 0.0, 1.0],
[1.0, 0.0, 0.0],
[0.0, 1.0, 0.0]
]
>
iex> Scholar.Preprocessing.binarize(Nx.tensor([[1.0, -1.0, 2.0], [2.0, 0.0, 0.0], [0.0, 1.0, -1.0]]), threshold: 1.3, type: {:u, 8})
#Nx.Tensor<
u8[3][3]
[
[0, 0, 1],
[1, 0, 0],
[0, 0, 0]
]
>
Scales a tensor by dividing each sample in batch by maximum absolute value in the batch
Options
:axes
- Axes to calculate the distance over. By default the distance is calculated between the whole tensors.
Examples
iex> Scholar.Preprocessing.max_abs_scale(Nx.tensor([1, 2, 3]))
#Nx.Tensor<
f32[3]
[0.3333333432674408, 0.6666666865348816, 1.0]
>
iex> Scholar.Preprocessing.max_abs_scale(Nx.tensor([[1, -1, 2], [3, 0, 0], [0, 1, -1], [2, 3, 1]]), axes: [0])
#Nx.Tensor<
f32[4][3]
[
[0.3333333432674408, -0.3333333432674408, 1.0],
[1.0, 0.0, 0.0],
[0.0, 0.3333333432674408, -0.5],
[0.6666666865348816, 1.0, 0.5]
]
>
iex> Scholar.Preprocessing.max_abs_scale(42)
#Nx.Tensor<
f32
1.0
>
Transform a tensor by scaling each batch to the given range.
Options
:axes
- Axes to calculate the distance over. By default the distance is calculated between the whole tensors.:min
- The lower boundary of the desired range of transformed data. The default value is0
.:max
- The upper boundary of the desired range of transformed data. The default value is1
.
Examples
iex> Scholar.Preprocessing.min_max_scale(Nx.tensor([1, 2, 3]))
#Nx.Tensor<
f32[3]
[0.0, 0.5, 1.0]
>
iex> Scholar.Preprocessing.min_max_scale(Nx.tensor([[1, -1, 2], [3, 0, 0], [0, 1, -1], [2, 3, 1]]), axes: [0])
#Nx.Tensor<
f32[4][3]
[
[0.3333333432674408, 0.0, 1.0],
[1.0, 0.25, 0.3333333432674408],
[0.0, 0.5, 0.0],
[0.6666666865348816, 1.0, 0.6666666865348816]
]
>
iex> Scholar.Preprocessing.min_max_scale(Nx.tensor([[1, -1, 2], [3, 0, 0], [0, 1, -1], [2, 3, 1]]), axes: [0], min: 1, max: 3)
#Nx.Tensor<
f32[4][3]
[
[1.6666667461395264, 1.0, 3.0],
[3.0, 1.5, 1.6666667461395264],
[1.0, 2.0, 1.0],
[2.3333334922790527, 3.0, 2.3333334922790527]
]
>
iex> Scholar.Preprocessing.min_max_scale(42)
#Nx.Tensor<
f32
0.0
>
Normalize samples individually to unit norm.
The zero-tensors cannot be normalized and they stay the same after normalization.
Options
:axes
- Axes to calculate the distance over. By default the distance is calculated between the whole tensors.:norm
- The norm to use to normalize each non zero sample. Possible options are:euclidean
,:manhattan
, and:chebyshev
The default value is:euclidean
.
Examples
iex> Scholar.Preprocessing.normalize(Nx.tensor([[0, 0, 0], [3, 4, 5], [-2, 4, 3]]), axes: [1])
#Nx.Tensor<
f32[3][3]
[
[0.0, 0.0, 0.0],
[0.4242640733718872, 0.5656854510307312, 0.7071067690849304],
[-0.3713906705379486, 0.7427813410758972, 0.5570860505104065]
]
>
iex> Scholar.Preprocessing.normalize(Nx.tensor([[0, 0, 0], [3, 4, 5], [-2, 4, 3]]))
#Nx.Tensor<
f32[3][3]
[
[0.0, 0.0, 0.0],
[0.3375263810157776, 0.4500351846218109, 0.5625439882278442],
[-0.22501759231090546, 0.4500351846218109, 0.3375263810157776]
]
>
Encode labels as a one-hot numeric tensor.
Labels must be integers from 0 to :num_classes - 1
. If the data does
not meet the condition, please use ordinal_encoding/2
first.
Options
:num_classes
(pos_integer/0
) - Required. Number of classes to be encoded.
Examples
iex> Scholar.Preprocessing.one_hot_encode(Nx.tensor([2, 0, 3, 2, 1, 1, 0]), num_classes: 4)
#Nx.Tensor<
u8[7][4]
[
[0, 0, 1, 0],
[1, 0, 0, 0],
[0, 0, 0, 1],
[0, 0, 1, 0],
[0, 1, 0, 0],
[0, 1, 0, 0],
[1, 0, 0, 0]
]
>
Encodes a tensor's values into integers from range 0 to :num_classes - 1
.
Options
:num_classes
(pos_integer/0
) - Required. Number of classes to be encoded.
Examples
iex> Scholar.Preprocessing.ordinal_encode(Nx.tensor([3, 2, 4, 56, 2, 4, 2]), num_classes: 4)
#Nx.Tensor<
s64[7]
[1, 0, 2, 3, 0, 2, 0]
>
Standardizes the tensor by removing the mean and scaling to unit variance.
Formula for input tensor $x$: $$ z = \frac{x - \mu}{\sigma} $$ Where $\mu$ is the mean of the samples, and $\sigma$ is the standard deviation. Standardization can be helpful in cases where the data follows a Gaussian distribution (or Normal distribution) without outliers.
Options
:axes
- Axes to calculate the distance over. By default the distance is calculated between the whole tensors.
Examples
iex> Scholar.Preprocessing.standard_scale(Nx.tensor([1,2,3]))
#Nx.Tensor<
f32[3]
[-1.2247447967529297, 0.0, 1.2247447967529297]
>
iex> Scholar.Preprocessing.standard_scale(Nx.tensor([[1, -1, 2], [2, 0, 0], [0, 1, -1]]))
#Nx.Tensor<
f32[3][3]
[
[0.5212860703468323, -1.3553436994552612, 1.4596009254455566],
[1.4596009254455566, -0.4170288145542145, -0.4170288145542145],
[-0.4170288145542145, 0.5212860703468323, -1.3553436994552612]
]
>
iex> Scholar.Preprocessing.standard_scale(Nx.tensor([[1, -1, 2], [2, 0, 0], [0, 1, -1]]), axes: [1])
#Nx.Tensor<
f32[3][3]
[
[0.26726120710372925, -1.3363062143325806, 1.069044828414917],
[1.4142135381698608, -0.7071068286895752, -0.7071068286895752],
[0.0, 1.2247447967529297, -1.2247447967529297]
]
>
iex> Scholar.Preprocessing.standard_scale(42)
#Nx.Tensor<
f32
42.0
>