Defn-traceable quantized layer op for use inside Axon graphs.
quantized_dense/4 is the drop-in replacement for Axon.Layers.dense/4
on a %Emily.QuantizedWeight{} kernel. See Emily.Quantization for
the defn-integration trade-offs; the qwen3_quantized notebook walks
through a concrete Axon.rewrite_nodes/2-based graph rewrite that
swaps every :dense for a layer calling this op.
Summary
Functions
Axon layer op: x @ W (+ bias) where W is a %QuantizedWeight{}.
Functions
Axon layer op: x @ W (+ bias) where W is a %QuantizedWeight{}.
Mirrors the signature of Axon.Quantization.Layers.weight_only_quantized_dense/4:
input— activation tensor, shape(..., in).kernel—%QuantizedWeight{}. The stored layout is determined bykernel.transpose:transpose: false(the AWQ / Axon-native layout) — packed representation of a[in, out]weight; the layer computesNx.dot(x, dense).transpose: true(the MLX / PyTorch-native layout, i.e. fresh output ofQuantizedWeight.from_dense/2on a[out, in]weight) — packed representation of a[out, in]weight; the layer computesNx.dot(x, Nx.transpose(dense)).
bias— either anNx.Tensor, a number, or a keyword list (in which case it's treated asoptsand bias defaults to 0). MatchesAxon.Quantization.Layers.weight_only_quantized_dense/4's signature for drop-in use underAxon.layer/3.opts— reserved for Axon-layer metadata; not used by this implementation directly (all state lives on the%QuantizedWeight{}).
Examples
iex> w = Nx.iota({4, 128}, backend: Emily.Backend, type: :f32)
iex> qw = Emily.QuantizedWeight.from_dense(w)
iex> x = Nx.iota({2, 128}, backend: Emily.Backend, type: :f32)
iex> y = Emily.Quantization.Layers.quantized_dense(x, qw)
iex> Nx.shape(y)
{2, 4}