viva_math/autodiff_reverse

Reverse-mode automatic differentiation via a computation tape.

Forward-mode AD (viva_math/autodiff) is optimal when the input dimension is small relative to the output (e.g. one-variable gradients). Reverse-mode AD (“backprop”) is optimal for the opposite case: scalar output, high-dimensional input — exactly the regime of neural networks and gradient-based MAP inference.

How it works

  1. The forward pass builds a directed acyclic graph (the “tape”) that records every elementary operation and its operands.
  2. The backward pass walks the tape in reverse, applying the chain rule to accumulate ∂output/∂node for every node.

All inputs that share a tape are differentiated in a single backward pass, regardless of how many there are — that’s why reverse-mode AD is O(1) in input dimension for gradient computation.

Limitations of this implementation

Example

import viva_math/autodiff_reverse as ad

// f(x, y, z) = x² + 2·y·z
let tape = ad.empty_tape()
let #(x, tape) = ad.input(tape, 1.0)
let #(y, tape) = ad.input(tape, 2.0)
let #(z, tape) = ad.input(tape, 3.0)
let #(x_sq, tape) = ad.mul(tape, x, x)
let #(yz, tape) = ad.mul(tape, y, z)
let #(yz2, tape) = ad.scale(tape, yz, 2.0)
let #(out, tape) = ad.add(tape, x_sq, yz2)
let grads = ad.backward(tape, out)
// grads[x] = 2x = 2.0
// grads[y] = 2z = 6.0
// grads[z] = 2y = 4.0

Types

One node on the tape. Public for tape introspection.

pub type Node {
  Node(value: Float, op: Op)
}

Constructors

  • Node(value: Float, op: Op)

Reference to a node in the tape — an opaque handle.

pub type NodeId =
  Int

What kind of operation produced this node. Public because it appears in the body of Node, which is reachable from the public Tape.

pub type Op {
  Input
  Add(Int, Int)
  Sub(Int, Int)
  Mul(Int, Int)
  Div(Int, Int)
  Neg(Int)
  Scale(Int, Float)
  Exp(Int)
  Ln(Int)
  Sin(Int)
  Cos(Int)
  Tanh(Int)
  Sigmoid(Int)
  Pow(Int, Float)
}

Constructors

  • Input
  • Add(Int, Int)
  • Sub(Int, Int)
  • Mul(Int, Int)
  • Div(Int, Int)
  • Neg(Int)
  • Scale(Int, Float)
  • Exp(Int)
  • Ln(Int)
  • Sin(Int)
  • Cos(Int)
  • Tanh(Int)
  • Sigmoid(Int)
  • Pow(Int, Float)

Computation tape — append-only graph of forward computations.

pub type Tape {
  Tape(nodes: dict.Dict(Int, Node), next_id: Int)
}

Constructors

Values

pub fn add(tape: Tape, a: Int, b: Int) -> #(Int, Tape)
pub fn backward(tape: Tape, output: Int) -> dict.Dict(Int, Float)

Compute ∂output/∂node for every node, given a scalar output node.

Returns a Dict mapping each NodeId to its accumulated gradient. The output node itself has gradient 1.0.

pub fn cos(tape: Tape, a: Int) -> #(Int, Tape)
pub fn div(tape: Tape, a: Int, b: Int) -> #(Int, Tape)
pub fn empty_tape() -> Tape
pub fn exp(tape: Tape, a: Int) -> #(Int, Tape)
pub fn grad_of(grads: dict.Dict(Int, Float), input: Int) -> Float

Extract gradient ∂output/∂input from the backward result.

pub fn gradients(
  inputs: List(Float),
  build: fn(Tape, List(Int)) -> #(Int, Tape),
) -> List(Float)

Convenience: gradient of a scalar function with respect to many inputs.

Runs the function on a fresh tape, performs the backward pass, and returns the gradient list aligned with the input list.

pub fn input(tape: Tape, value: Float) -> #(Int, Tape)

Register a new input variable on the tape.

pub fn ln(tape: Tape, a: Int) -> #(Int, Tape)
pub fn mul(tape: Tape, a: Int, b: Int) -> #(Int, Tape)
pub fn neg(tape: Tape, a: Int) -> #(Int, Tape)
pub fn pow(tape: Tape, a: Int, n: Float) -> #(Int, Tape)
pub fn scale(tape: Tape, a: Int, s: Float) -> #(Int, Tape)
pub fn sigmoid(tape: Tape, a: Int) -> #(Int, Tape)
pub fn sin(tape: Tape, a: Int) -> #(Int, Tape)
pub fn sub(tape: Tape, a: Int, b: Int) -> #(Int, Tape)
pub fn tanh(tape: Tape, a: Int) -> #(Int, Tape)
pub fn value(tape: Tape, id: Int) -> Float

Read the forward value of a node.

Search Document