ExTorch.NN
(extorch v0.4.0)
Copy Markdown
Neural network layer creation and operations.
This module provides functions to create PyTorch nn.Module layers and
run forward passes on them. Layers are created eagerly (not JIT-compiled)
and support autograd for training.
Example
linear = ExTorch.NN.linear(784, 128)
relu = ExTorch.NN.relu()
input = ExTorch.randn({1, 784})
output = input |> ExTorch.NN.forward(linear) |> ExTorch.NN.forward(relu)
Summary
Functions
Applies a 1D adaptive average pooling over an input signal.
Applies a 2D adaptive average pooling over an input signal.
Applies a 1D average pooling over an input signal.
Applies a 2D average pooling over an input signal.
Applies Batch Normalization over a 2D or 3D input (a mini-batch of 1D inputs with optional additional channel dimension).
Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension).
Applies a 1D convolution over an input signal composed of several input planes.
Applies a 2D convolution over an input signal composed of several input planes.
Applies a 3D convolution over an input signal composed of several input planes.
Applies a 1D transposed convolution operator (sometimes called "deconvolution").
Applies a 2D transposed convolution operator (sometimes called "deconvolution").
Copy parameter values from a list of {name, tensor} tuples into a layer.
During training, randomly zeroes some of the elements of the input tensor
with probability p using samples from a Bernoulli distribution. Each
channel will be zeroed out independently on every forward call.
Applies element-wise: $ELU(x) = \max(0, x) + \min(0, \alpha * (e^x - 1))$
A simple lookup table that stores embeddings of a fixed dictionary and size.
Set a layer to evaluation mode.
Flattens a contiguous range of dims into a tensor.
Run the forward pass of a layer on an input tensor.
Applies the Gaussian Error Linear Units function: $GELU(x) = x \Phi(x)$
Applies Group Normalization over a mini-batch of inputs.
Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence.
Applies Instance Normalization over a 3D input (a mini-batch of 1D inputs).
Applies Instance Normalization over a 4D input (a mini-batch of 2D inputs).
Applies Layer Normalization over a mini-batch of inputs.
Applies element-wise: $LeakyReLU(x) = \max(0, x) + negative\_slope * \min(0, x)$
Applies a linear transformation to the incoming data: $y = xA^T + b$.
Applies the LogSoftmax function: $LogSoftmax(x_i) = \log\left(\frac{e^{x_i}}{\sum_j e^{x_j}}\right)$
Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.
Applies a 1D max pooling over an input signal composed of several input planes.
Applies a 2D max pooling over an input signal composed of several input planes.
Applies the Mish function, element-wise.
Allows the model to jointly attend to information from different representation subspaces.
Get named parameters of a layer.
Applies element-wise: $PReLU(x) = \max(0, x) + a * \min(0, x)$
Applies the rectified linear unit function element-wise: $ReLU(x) = \max(0, x)$.
Applies the element-wise Sigmoid function: $Sigmoid(x) = \frac{1}{1 + e^{-x}}$
Applies the Sigmoid Linear Unit (SiLU) function, element-wise. Also known as the swish function.
Applies the Softmax function to an n-dimensional input tensor, rescaling them so that the elements of the n-dimensional output tensor lie in the range [0,1] and sum to 1.
Applies the element-wise Tanh function: $Tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$
Move a layer to a different device.
Set a layer to training mode.
Unflattens a tensor dim, expanding it to a desired shape.
Functions
@spec adaptive_avg_pool1d(integer()) :: ExTorch.NN.Layer.t()
Applies a 1D adaptive average pooling over an input signal.
The output size is output_size regardless of input size.
Args
output_size(integer) - the target output size.
Shape
- Input:
{N, C, L_in}. Output:{N, C, output_size}.
@spec adaptive_avg_pool2d(integer(), integer()) :: ExTorch.NN.Layer.t()
Applies a 2D adaptive average pooling over an input signal.
The output spatial size is {output_h, output_w} regardless of input size.
Args
output_h(integer) - the target output height.output_w(integer) - the target output width.
Shape
- Input:
{N, C, H_in, W_in}. Output:{N, C, output_h, output_w}.
Examples
iex> m = ExTorch.NN.adaptive_avg_pool2d(1, 1)
iex> input = ExTorch.randn({1, 64, 7, 7})
iex> output = ExTorch.NN.forward(input, m)
iex> output.size
{1, 64, 1, 1}
@spec avg_pool1d( integer(), keyword() ) :: ExTorch.NN.Layer.t()
Applies a 1D average pooling over an input signal.
Args
kernel_size(integer) - the size of the sliding window.opts(keyword) - optional::stride(defaultkernel_size),:padding(default0),:ceil_mode(defaultfalse),:count_include_pad(defaulttrue).
Shape
- Input:
{N, C, L_in}. Output:{N, C, L_out}.
@spec avg_pool2d( integer(), keyword() ) :: ExTorch.NN.Layer.t()
Applies a 2D average pooling over an input signal.
Args
kernel_size(integer) - the size of the sliding window.opts(keyword) - optional::stride(defaultkernel_size),:padding(default0),:ceil_mode(defaultfalse),:count_include_pad(defaulttrue).
Shape
- Input:
{N, C, H_in, W_in}. Output:{N, C, H_out, W_out}.
@spec batch_norm1d( integer(), keyword() ) :: ExTorch.NN.Layer.t()
Applies Batch Normalization over a 2D or 3D input (a mini-batch of 1D inputs with optional additional channel dimension).
$y = \frac{x - E[x]}{\sqrt{Var[x] + \epsilon}} * \gamma + \beta$
The mean and standard-deviation are calculated per-dimension over the mini-batches and $\gamma$ and $\beta$ are learnable parameter vectors of size C (where C is the number of features).
Args
num_features(integer) - C from an expected input of size{N, C}or{N, C, L}.opts(keyword) - optional arguments::eps(float) - value added to the denominator for numerical stability. Default:1.0e-5.:momentum(float) - the value used for the running_mean and running_var computation. Default:0.1.:affine(boolean) - iftrue, has learnable affine parameters. Default:true.:track_running_stats(boolean) - iftrue, tracks running mean and variance. Default:true.
Shape
- Input:
{N, C}or{N, C, L}. - Output: same shape as input.
@spec batch_norm2d( integer(), keyword() ) :: ExTorch.NN.Layer.t()
Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension).
$y = \frac{x - E[x]}{\sqrt{Var[x] + \epsilon}} * \gamma + \beta$
Args
num_features(integer) - C from an expected input of size{N, C, H, W}.opts(keyword) - same asbatch_norm1d/2.
Shape
- Input:
{N, C, H, W}. - Output: same shape as input.
@spec conv1d(integer(), integer(), integer(), keyword()) :: ExTorch.NN.Layer.t()
Applies a 1D convolution over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size {N, C_in, L}
and output {N, C_out, L_out} can be described as:
$out(N_i, C_{out_j}) = bias(C_{out_j}) + \sum_{k=0}^{C_{in}-1} weight(C_{out_j}, k) \star input(N_i, k)$
Args
in_channels(integer) - number of channels in the input signal.out_channels(integer) - number of channels produced by the convolution.kernel_size(integer) - size of the convolving kernel.opts(keyword) - optional arguments::stride(integer) - stride of the convolution. Default:1.:padding(integer) - zero-padding added to both sides of the input. Default:0.:dilation(integer) - spacing between kernel elements. Default:1.:groups(integer) - number of blocked connections from input to output channels. Default:1.:bias(boolean) - iffalse, the layer will not learn an additive bias. Default:true.
Shape
- Input:
{N, C_in, L_in}or{C_in, L_in}. - Output:
{N, C_out, L_out}or{C_out, L_out}.
Examples
iex> m = ExTorch.NN.conv1d(16, 33, 3, stride: 2)
iex> input = ExTorch.randn({20, 16, 50})
iex> output = ExTorch.NN.forward(input, m)
iex> output.size
{20, 33, 24}
@spec conv2d(integer(), integer(), integer(), keyword()) :: ExTorch.NN.Layer.t()
Applies a 2D convolution over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size {N, C_in, H, W}
and output {N, C_out, H_out, W_out} can be described as:
$out(N_i, C_{out_j}) = bias(C_{out_j}) + \sum_{k=0}^{C_{in}-1} weight(C_{out_j}, k) \star input(N_i, k)$
Args
in_channels(integer) - number of channels in the input image.out_channels(integer) - number of channels produced by the convolution.kernel_size(integer) - size of the convolving kernel.opts(keyword) - optional arguments::stride(integer) - stride of the convolution. Default:1.:padding(integer) - zero-padding added to both sides of the input. Default:0.:dilation(integer) - spacing between kernel elements. Default:1.:groups(integer) - number of blocked connections from input to output channels. Default:1.:bias(boolean) - iffalse, the layer will not learn an additive bias. Default:true.
Shape
- Input:
{N, C_in, H_in, W_in}or{C_in, H_in, W_in}. - Output:
{N, C_out, H_out, W_out}or{C_out, H_out, W_out}.
where $H_{out} = \lfloor\frac{H_{in} + 2 \times padding - dilation \times (kernel\_size - 1) - 1}{stride} + 1\rfloor$
Examples
iex> m = ExTorch.NN.conv2d(3, 16, 3, padding: 1)
iex> input = ExTorch.randn({1, 3, 32, 32})
iex> output = ExTorch.NN.forward(input, m)
iex> output.size
{1, 16, 32, 32}
@spec conv3d(integer(), integer(), integer(), keyword()) :: ExTorch.NN.Layer.t()
Applies a 3D convolution over an input signal composed of several input planes.
Args
in_channels(integer) - number of channels in the input volume.out_channels(integer) - number of channels produced by the convolution.kernel_size(integer) - size of the convolving kernel.opts(keyword) - optional::stride(default1),:padding(default0),:dilation(default1),:groups(default1),:bias(defaulttrue).
Shape
- Input:
{N, C_in, D_in, H_in, W_in}. - Output:
{N, C_out, D_out, H_out, W_out}.
@spec conv_transpose1d(integer(), integer(), integer(), keyword()) :: ExTorch.NN.Layer.t()
Applies a 1D transposed convolution operator (sometimes called "deconvolution").
Args
in_channels(integer) - number of channels in the input signal.out_channels(integer) - number of channels produced by the convolution.kernel_size(integer) - size of the convolving kernel.opts(keyword) - optional::stride(default1),:padding(default0),:output_padding(default0),:groups(default1),:bias(defaulttrue),:dilation(default1).
Shape
- Input:
{N, C_in, L_in}. - Output:
{N, C_out, L_out}where $L_{out} = (L_{in} - 1) \times stride - 2 \times padding + dilation \times (kernel\_size - 1) + output\_padding + 1$.
@spec conv_transpose2d(integer(), integer(), integer(), keyword()) :: ExTorch.NN.Layer.t()
Applies a 2D transposed convolution operator (sometimes called "deconvolution").
Args
in_channels(integer) - number of channels in the input image.out_channels(integer) - number of channels produced by the convolution.kernel_size(integer) - size of the convolving kernel.opts(keyword) - optional::stride(default1),:padding(default0),:output_padding(default0),:groups(default1),:bias(defaulttrue),:dilation(default1).
Shape
- Input:
{N, C_in, H_in, W_in}. - Output:
{N, C_out, H_out, W_out}.
Examples
iex> m = ExTorch.NN.conv_transpose2d(16, 3, 3, stride: 2, padding: 1, output_padding: 1)
iex> input = ExTorch.randn({1, 16, 4, 4})
iex> output = ExTorch.NN.forward(input, m)
iex> output.size
{1, 3, 8, 8}
@spec copy_parameters(ExTorch.NN.Layer.t(), [{String.t(), ExTorch.Tensor.t()}]) :: :ok
Copy parameter values from a list of {name, tensor} tuples into a layer.
This enables loading pre-trained weights from a JIT model or another layer. Parameter names must match. The copy is performed in-place under a no-grad guard.
Args
layer(ExTorch.NN.Layer) - destination layer.params([{String.t(), ExTorch.Tensor.t()}]) - source parameters.
@spec dropout(keyword()) :: ExTorch.NN.Layer.t()
During training, randomly zeroes some of the elements of the input tensor
with probability p using samples from a Bernoulli distribution. Each
channel will be zeroed out independently on every forward call.
Furthermore, the outputs are scaled by a factor of $\frac{1}{1-p}$ during training. This means that during evaluation the module simply computes an identity function.
Args
opts(keyword) - optional arguments::p(float) - probability of an element to be zeroed. Default:0.5.:inplace(boolean) - iftrue, will do this operation in-place. Default:false.
Shape
- Input:
{*}(any shape). - Output: same shape as input.
Examples
iex> m = ExTorch.NN.dropout(p: 0.2)
#NN.Layer<Dropout>
@spec elu(keyword()) :: ExTorch.NN.Layer.t()
Applies element-wise: $ELU(x) = \max(0, x) + \min(0, \alpha * (e^x - 1))$
Args
opts(keyword) - optional::alpha(float) - the $\alpha$ value for the ELU formulation. Default:1.0.:inplace(boolean) - do the operation in-place. Default:false.
Shape
- Input:
{*}. Output: same shape as input.
@spec embedding(integer(), integer(), keyword()) :: ExTorch.NN.Layer.t()
A simple lookup table that stores embeddings of a fixed dictionary and size.
This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings.
Args
num_embeddings(integer) - size of the embedding dictionary.embedding_dim(integer) - the size of each embedding vector.opts(keyword) - optional arguments::padding_idx(integerornil) - if specified, the entries atpadding_idxdo not contribute to the gradient. Default:nil.
Shape
- Input:
{*}(integer tensor of arbitrary shape). - Output:
{*, embedding_dim}.
Variables
weight- the learnable weights of shape{num_embeddings, embedding_dim}.
Examples
iex> m = ExTorch.NN.embedding(10, 3)
iex> input = ExTorch.tensor([1, 2, 4, 5], dtype: :int64)
iex> output = ExTorch.NN.forward(input, m)
iex> output.size
{4, 3}
@spec eval(ExTorch.NN.Layer.t()) :: :ok
Set a layer to evaluation mode.
This has effect on certain modules like Dropout and BatchNorm
which behave differently during training vs evaluation.
@spec flatten(keyword()) :: ExTorch.NN.Layer.t()
Flattens a contiguous range of dims into a tensor.
Args
opts(keyword) - optional::start_dim(integer) - first dim to flatten. Default:1.:end_dim(integer) - last dim to flatten. Default:-1.
Shape
- Input:
{*, S_start, ..., S_end, *}. - Output:
{*, product(S_start...S_end), *}.
Examples
iex> m = ExTorch.NN.flatten()
iex> input = ExTorch.randn({2, 3, 4, 5})
iex> output = ExTorch.NN.forward(input, m)
iex> output.size
{2, 60}
@spec forward(ExTorch.Tensor.t(), ExTorch.NN.Layer.t()) :: ExTorch.Tensor.t()
Run the forward pass of a layer on an input tensor.
Applies the layer's computation to the input and returns the output tensor. This is the primary way to use layers created by the factory functions in this module.
Args
input(ExTorch.Tensor) - input tensor.layer(ExTorch.NN.Layer) - the layer to apply.
Returns
The output tensor.
Examples
iex> m = ExTorch.NN.linear(10, 5)
iex> input = ExTorch.randn({1, 10})
iex> output = ExTorch.NN.forward(input, m)
iex> output.size
{1, 5}
@spec gelu() :: ExTorch.NN.Layer.t()
Applies the Gaussian Error Linear Units function: $GELU(x) = x \Phi(x)$
where $\Phi(x)$ is the cumulative distribution function for the Gaussian distribution.
Shape
- Input:
{*}. Output: same shape as input.
@spec group_norm(integer(), integer(), keyword()) :: ExTorch.NN.Layer.t()
Applies Group Normalization over a mini-batch of inputs.
$y = \frac{x - E[x]}{\sqrt{Var[x] + \epsilon}} * \gamma + \beta$
The input channels are separated into num_groups groups, each containing
num_channels / num_groups channels. Mean and standard-deviation are
calculated separately over each group.
Args
num_groups(integer) - number of groups to separate the channels into.num_channels(integer) - number of channels expected in input.opts(keyword) - optional::eps(default1.0e-5),:affine(defaulttrue).
Shape
- Input:
{N, C, *}where C =num_channels. - Output: same shape as input.
@spec gru(integer(), integer(), keyword()) :: ExTorch.NN.Layer.t()
Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence.
For each element in the input sequence, each layer computes:
$r_t = \sigma(W_{ir} x_t + b_{ir} + W_{hr} h_{t-1} + b_{hr})$
$z_t = \sigma(W_{iz} x_t + b_{iz} + W_{hz} h_{t-1} + b_{hz})$
$n_t = \tanh(W_{in} x_t + b_{in} + r_t \odot (W_{hn} h_{t-1} + b_{hn}))$
$h_t = (1 - z_t) \odot n_t + z_t \odot h_{t-1}$
Args
input_size(integer) - the number of expected features in the input.hidden_size(integer) - the number of features in the hidden state.opts(keyword) - same options aslstm/3.
Shape
- Input:
{L, N, H_in}or{N, L, H_in}whenbatch_first: true. - Output:
{L, N, D * H_out}where D = 2 if bidirectional, else 1.
@spec instance_norm1d( integer(), keyword() ) :: ExTorch.NN.Layer.t()
Applies Instance Normalization over a 3D input (a mini-batch of 1D inputs).
$y = \frac{x - E[x]}{\sqrt{Var[x] + \epsilon}} * \gamma + \beta$
The mean and standard-deviation are calculated per-instance per-channel.
Args
num_features(integer) - C from an expected input of size{N, C, L}.opts(keyword) - optional::eps(default1.0e-5),:momentum(default0.1),:affine(defaultfalse),:track_running_stats(defaultfalse).
Shape
- Input:
{N, C, L}. Output: same shape.
@spec instance_norm2d( integer(), keyword() ) :: ExTorch.NN.Layer.t()
Applies Instance Normalization over a 4D input (a mini-batch of 2D inputs).
Args
num_features(integer) - C from an expected input of size{N, C, H, W}.opts(keyword) - same asinstance_norm1d/2.
Shape
- Input:
{N, C, H, W}. Output: same shape.
@spec layer_norm( [integer()] | tuple(), keyword() ) :: ExTorch.NN.Layer.t()
Applies Layer Normalization over a mini-batch of inputs.
$y = \frac{x - E[x]}{\sqrt{Var[x] + \epsilon}} * \gamma + \beta$
The mean and standard-deviation are calculated over the last D dimensions,
where D is the dimension of normalized_shape.
Args
normalized_shape([integer]ortuple) - input shape from an expected input.opts(keyword) - optional arguments::eps(float) - value added to the denominator for numerical stability. Default:1.0e-5.:elementwise_affine(boolean) - iftrue, has learnable per-element affine parameters. Default:true.
Shape
- Input:
{N, *}where*matchesnormalized_shape. - Output: same shape as input.
Examples
iex> m = ExTorch.NN.layer_norm([10])
iex> input = ExTorch.randn({3, 10})
iex> output = ExTorch.NN.forward(input, m)
iex> output.size
{3, 10}
@spec leaky_relu(keyword()) :: ExTorch.NN.Layer.t()
Applies element-wise: $LeakyReLU(x) = \max(0, x) + negative\_slope * \min(0, x)$
Args
opts(keyword) - optional::negative_slope(float) - controls the angle of the negative slope. Default:0.01.:inplace(boolean) - do the operation in-place. Default:false.
Shape
- Input:
{*}. Output: same shape as input.
@spec linear(integer(), integer(), keyword()) :: ExTorch.NN.Layer.t()
Applies a linear transformation to the incoming data: $y = xA^T + b$.
This layer implements a fully connected layer with in_features inputs
and out_features outputs.
Args
in_features(integer) - size of each input sample.out_features(integer) - size of each output sample.opts(keyword) - optional arguments::bias(boolean) - iffalse, the layer will not learn an additive bias. Default:true.
Shape
- Input:
{*, H_in}where*means any number of dimensions including none andH_in = in_features. - Output:
{*, H_out}whereH_out = out_features.
Variables
weight- the learnable weights of shape{out_features, in_features}.bias- the learnable bias of shape{out_features}.
Examples
iex> m = ExTorch.NN.linear(20, 30)
iex> input = ExTorch.randn({128, 20})
iex> output = ExTorch.NN.forward(input, m)
iex> output.size
{128, 30}
@spec log_softmax(integer()) :: ExTorch.NN.Layer.t()
Applies the LogSoftmax function: $LogSoftmax(x_i) = \log\left(\frac{e^{x_i}}{\sum_j e^{x_j}}\right)$
Args
dim(integer) - dimension along which LogSoftmax will be computed.
Shape
- Input:
{*}. Output: same shape as input.
@spec lstm(integer(), integer(), keyword()) :: ExTorch.NN.Layer.t()
Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.
For each element in the input sequence, each layer computes the following function:
$i_t = \sigma(W_{ii} x_t + b_{ii} + W_{hi} h_{t-1} + b_{hi})$
$f_t = \sigma(W_{if} x_t + b_{if} + W_{hf} h_{t-1} + b_{hf})$
$g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hg} h_{t-1} + b_{hg})$
$o_t = \sigma(W_{io} x_t + b_{io} + W_{ho} h_{t-1} + b_{ho})$
$c_t = f_t \odot c_{t-1} + i_t \odot g_t$
$h_t = o_t \odot \tanh(c_t)$
Args
input_size(integer) - the number of expected features in the input.hidden_size(integer) - the number of features in the hidden state.opts(keyword) - optional arguments::num_layers(integer) - number of recurrent layers. Default:1.:bias(boolean) - iffalse, the layer does not use bias weights. Default:true.:batch_first(boolean) - iftrue, input/output tensors are{batch, seq, feature}. Default:false.:dropout(float) - dropout probability on outputs of each layer except the last. Default:0.0.:bidirectional(boolean) - iftrue, becomes a bidirectional LSTM. Default:false.
Shape
- Input:
{L, N, H_in}or{N, L, H_in}whenbatch_first: true. - Output:
{L, N, D * H_out}where D = 2 if bidirectional, else 1.
@spec max_pool1d( integer(), keyword() ) :: ExTorch.NN.Layer.t()
Applies a 1D max pooling over an input signal composed of several input planes.
Args
kernel_size(integer) - the size of the sliding window.opts(keyword) - optional arguments::stride(integer) - stride of the sliding window. Default:kernel_size.:padding(integer) - implicit zero padding on both sides. Default:0.:dilation(integer) - stride between elements within the sliding window. Default:1.:ceil_mode(boolean) - use ceil instead of floor to compute output shape. Default:false.
Shape
- Input:
{N, C, L_in}or{C, L_in}. - Output:
{N, C, L_out}or{C, L_out}where $L_{out} = \lfloor\frac{L_{in} + 2 \times padding - dilation \times (kernel\_size - 1) - 1}{stride} + 1\rfloor$.
@spec max_pool2d( integer(), keyword() ) :: ExTorch.NN.Layer.t()
Applies a 2D max pooling over an input signal composed of several input planes.
Args
kernel_size(integer) - the size of the sliding window.opts(keyword) - optional arguments::stride(integer) - stride of the sliding window. Default:kernel_size.:padding(integer) - implicit zero padding on both sides. Default:0.:dilation(integer) - stride between elements within the sliding window. Default:1.:ceil_mode(boolean) - use ceil instead of floor to compute output shape. Default:false.
Shape
- Input:
{N, C, H_in, W_in}or{C, H_in, W_in}. - Output:
{N, C, H_out, W_out}or{C, H_out, W_out}.
Examples
iex> m = ExTorch.NN.max_pool2d(2)
iex> input = ExTorch.randn({1, 1, 8, 8})
iex> output = ExTorch.NN.forward(input, m)
iex> output.size
{1, 1, 4, 4}
@spec mish() :: ExTorch.NN.Layer.t()
Applies the Mish function, element-wise.
$Mish(x) = x * \tanh(Softplus(x))$
Shape
- Input:
{*}. Output: same shape as input.
@spec multihead_attention(integer(), integer(), keyword()) :: ExTorch.NN.Layer.t()
Allows the model to jointly attend to information from different representation subspaces.
$MultiHead(Q, K, V) = Concat(head_1, ..., head_h) W^O$
where $head_i = Attention(Q W_i^Q, K W_i^K, V W_i^V)$.
Args
embed_dim(integer) - total dimension of the model.num_heads(integer) - number of parallel attention heads.embed_dimwill be split acrossnum_heads(i.e. each head will have dimensionembed_dim / num_heads).opts(keyword) - optional arguments::dropout(float) - dropout probability on attention weights. Default:0.0.:bias(boolean) - iffalse, input/output projection layers will not learn an additive bias. Default:true.
@spec parameters(ExTorch.NN.Layer.t()) :: [{String.t(), ExTorch.Tensor.t()}]
Get named parameters of a layer.
Returns a list of {name, tensor} tuples containing the learnable
parameters of the layer.
Examples
iex> m = ExTorch.NN.linear(10, 5)
iex> params = ExTorch.NN.parameters(m)
iex> length(params)
2
@spec prelu(keyword()) :: ExTorch.NN.Layer.t()
Applies element-wise: $PReLU(x) = \max(0, x) + a * \min(0, x)$
where $a$ is a learnable parameter.
Args
opts(keyword) - optional::num_parameters(integer) - number of $a$ to learn. Default:1.
Variables
weight- the learnable parameter $a$ of shape{num_parameters}.
Shape
- Input:
{*}. Output: same shape as input.
@spec relu(keyword()) :: ExTorch.NN.Layer.t()
Applies the rectified linear unit function element-wise: $ReLU(x) = \max(0, x)$.
Args
opts(keyword) - optional::inplace(boolean) - can optionally do the operation in-place. Default:false.
Shape
- Input:
{*}(any shape). - Output: same shape as input.
@spec sigmoid() :: ExTorch.NN.Layer.t()
Applies the element-wise Sigmoid function: $Sigmoid(x) = \frac{1}{1 + e^{-x}}$
Shape
- Input:
{*}. Output: same shape as input.
@spec silu() :: ExTorch.NN.Layer.t()
Applies the Sigmoid Linear Unit (SiLU) function, element-wise. Also known as the swish function.
$SiLU(x) = x * \sigma(x)$ where $\sigma(x)$ is the logistic sigmoid.
Shape
- Input:
{*}. Output: same shape as input.
@spec softmax(integer()) :: ExTorch.NN.Layer.t()
Applies the Softmax function to an n-dimensional input tensor, rescaling them so that the elements of the n-dimensional output tensor lie in the range [0,1] and sum to 1.
$Softmax(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}}$
Args
dim(integer) - dimension along which Softmax will be computed.
Shape
- Input:
{*}. Output: same shape as input.
@spec tanh() :: ExTorch.NN.Layer.t()
Applies the element-wise Tanh function: $Tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$
Shape
- Input:
{*}. Output: same shape as input.
@spec to(ExTorch.NN.Layer.t(), ExTorch.Device.device()) :: ExTorch.NN.Layer.t()
Move a layer to a different device.
Args
layer(ExTorch.NN.Layer) - the layer to move.device(ExTorch.Device) - target device (e.g.,:cpu,{:cuda, 0}).
Returns
A new layer on the target device.
@spec train(ExTorch.NN.Layer.t()) :: :ok
Set a layer to training mode.
See eval/1 for the inverse operation.
@spec unflatten(integer(), [integer()] | tuple()) :: ExTorch.NN.Layer.t()
Unflattens a tensor dim, expanding it to a desired shape.
Args
dim(integer) - dimension to unflatten.sizes([integer]ortuple) - new shape of the unflattened dimension.
Shape
- Input:
{*, S_dim, *}whereS_dim = product(sizes). - Output:
{*, sizes[0], sizes[1], ..., *}.