Emily.Quantization.Layers (emily v0.3.5)

Copy Markdown View Source

Defn-traceable quantized layer op for use inside Axon graphs.

quantized_dense/4 is the drop-in replacement for Axon.Layers.dense/4 on a %Emily.QuantizedWeight{} kernel. See Emily.Quantization for the defn-integration trade-offs; the qwen3_quantized notebook walks through a concrete Axon.rewrite_nodes/2-based graph rewrite that swaps every :dense for a layer calling this op.

Summary

Functions

Axon layer op: x @ W (+ bias) where W is a %QuantizedWeight{}.

Functions

quantized_dense(input, kernel, bias \\ 0, opts \\ [])

Axon layer op: x @ W (+ bias) where W is a %QuantizedWeight{}.

Mirrors the signature of Axon.Quantization.Layers.weight_only_quantized_dense/4:

  • input — activation tensor, shape (..., in).
  • kernel%QuantizedWeight{}. The stored layout is determined by kernel.transpose:
    • transpose: false (the AWQ / Axon-native layout) — packed representation of a [in, out] weight; the layer computes Nx.dot(x, dense).
    • transpose: true (the MLX / PyTorch-native layout, i.e. fresh output of QuantizedWeight.from_dense/2 on a [out, in] weight) — packed representation of a [out, in] weight; the layer computes Nx.dot(x, Nx.transpose(dense)).
  • bias — either an Nx.Tensor, a number, or a keyword list (in which case it's treated as opts and bias defaults to 0). Matches Axon.Quantization.Layers.weight_only_quantized_dense/4's signature for drop-in use under Axon.layer/3.
  • opts — reserved for Axon-layer metadata; not used by this implementation directly (all state lives on the %QuantizedWeight{}).

Examples

iex> w = Nx.iota({4, 128}, backend: Emily.Backend, type: :f32)
iex> qw = Emily.QuantizedWeight.from_dense(w)
iex> x = Nx.iota({2, 128}, backend: Emily.Backend, type: :f32)
iex> y = Emily.Quantization.Layers.quantized_dense(x, qw)
iex> Nx.shape(y)
{2, 4}