viva_math/autodiff_reverse
Reverse-mode automatic differentiation via a computation tape.
Forward-mode AD (viva_math/autodiff) is optimal when the input
dimension is small relative to the output (e.g. one-variable
gradients). Reverse-mode AD (“backprop”) is optimal for the opposite
case: scalar output, high-dimensional input — exactly the regime of
neural networks and gradient-based MAP inference.
How it works
- The forward pass builds a directed acyclic graph (the “tape”) that records every elementary operation and its operands.
- The backward pass walks the tape in reverse, applying the chain
rule to accumulate
∂output/∂nodefor every node.
All inputs that share a tape are differentiated in a single backward pass, regardless of how many there are — that’s why reverse-mode AD is O(1) in input dimension for gradient computation.
Limitations of this implementation
- Single-output, scalar gradients only. For Jacobians of vector-valued
functions, use
viva_math/autodiff.jacobian(forward) or call this multiple times. - The tape is a
Dict(Int, ...)for clarity; production tools use linear arrays for cache locality. Adequate for graphs ≤ 10⁴ nodes.
Example
import viva_math/autodiff_reverse as ad
// f(x, y, z) = x² + 2·y·z
let tape = ad.empty_tape()
let #(x, tape) = ad.input(tape, 1.0)
let #(y, tape) = ad.input(tape, 2.0)
let #(z, tape) = ad.input(tape, 3.0)
let #(x_sq, tape) = ad.mul(tape, x, x)
let #(yz, tape) = ad.mul(tape, y, z)
let #(yz2, tape) = ad.scale(tape, yz, 2.0)
let #(out, tape) = ad.add(tape, x_sq, yz2)
let grads = ad.backward(tape, out)
// grads[x] = 2x = 2.0
// grads[y] = 2z = 6.0
// grads[z] = 2y = 4.0
Types
What kind of operation produced this node. Public because it appears in
the body of Node, which is reachable from the public Tape.
pub type Op {
Input
Add(Int, Int)
Sub(Int, Int)
Mul(Int, Int)
Div(Int, Int)
Neg(Int)
Scale(Int, Float)
Exp(Int)
Ln(Int)
Sin(Int)
Cos(Int)
Tanh(Int)
Sigmoid(Int)
Pow(Int, Float)
}
Constructors
-
Input -
Add(Int, Int) -
Sub(Int, Int) -
Mul(Int, Int) -
Div(Int, Int) -
Neg(Int) -
Scale(Int, Float) -
Exp(Int) -
Ln(Int) -
Sin(Int) -
Cos(Int) -
Tanh(Int) -
Sigmoid(Int) -
Pow(Int, Float)
Values
pub fn backward(tape: Tape, output: Int) -> dict.Dict(Int, Float)
Compute ∂output/∂node for every node, given a scalar output node.
Returns a Dict mapping each NodeId to its accumulated gradient. The
output node itself has gradient 1.0.
pub fn empty_tape() -> Tape
pub fn grad_of(grads: dict.Dict(Int, Float), input: Int) -> Float
Extract gradient ∂output/∂input from the backward result.
pub fn gradients(
inputs: List(Float),
build: fn(Tape, List(Int)) -> #(Int, Tape),
) -> List(Float)
Convenience: gradient of a scalar function with respect to many inputs.
Runs the function on a fresh tape, performs the backward pass, and returns the gradient list aligned with the input list.