The goal
The demo teaches backpropagation by training a small scalar MLP classifier on a two-dimensional toy dataset. It mirrors the official micrograd workflow while keeping all data generation, loss computation, training, and graph inspection in pure Elixir.
Scalar autodiff warmup
The notebook starts with scalar values:
alias MicrogradEx.Value
x = Value.new(-4.0, label: "x")
y = x |> Value.mul(x) |> Value.relu()
gradients = Value.backward(y)
Value.grad(x, gradients)Each scalar operation creates a new Value and records a small local derivative edge. Value.backward/1 walks the graph in reverse and returns a Gradients table.
The two-moons dataset
The official Python notebook uses sklearn's make_moons. MicrogradEx uses MicrogradEx.Datasets.moons/2, a deterministic pure-Elixir generator with the same educational role.
Labels are -1.0 and 1.0 because the max-margin loss uses yi * scorei.
The MLP
The main model is:
alias MicrogradEx.NN.MLP
model = MLP.new(2, [16, 16, 1], seed: {1337, 1337, 1337})This means:
- 2 input values;
- first hidden layer with 16 neurons;
- second hidden layer with 16 neurons;
- output layer with 1 neuron.
Hidden layers use ReLU. The final layer is linear.
Parameter count
The official demo shape has 337 parameters:
First layer: 16 * (2 + 1) = 48
Second layer: 16 * (16 + 1) = 272
Output layer: 1 * (16 + 1) = 17
Total: 337The + 1 in each layer is the bias parameter per neuron.
The max-margin loss
The classification score is the scalar model output. A positive score predicts class 1; a non-positive score predicts class -1.
The loss is:
loss_i = relu(1 - yi * score_i)
data_loss = mean(loss_i)
reg_loss = alpha * sum(p * p)
total_loss = data_loss + reg_lossIn code this is MicrogradEx.Losses.max_margin/4.
L2 regularization
The regularization term penalizes large parameters:
alpha * sum(p * p for p <- NN.parameters(model))The default alpha is 1.0e-4, matching the official demo.
The training loop
Training is immutable:
gradients = Value.backward(total_loss)
next_model = NN.apply_gradients(model, gradients, learning_rate)MicrogradEx.Trainer.train/3 runs this loop for 100 steps by default and records loss, data loss, regularization loss, accuracy, and learning rate.
Plotting loss and accuracy
MicrogradEx.PlotData converts training runs into plain rows:
PlotData.loss_history(run)
PlotData.accuracy_history(run)The notebook renders those rows with Vega-Lite.
Decision boundary
The decision boundary is built by evaluating the trained model over a padded grid:
PlotData.decision_boundary(run.final_model, dataset, h: 0.25)Every grid point is classified by score sign, then plotted behind the training data.
Graph inspection
The scalar graph is built during forward operations. MicrogradEx exposes it without mutation:
Graph.nodes/2shows scalar node data and gradients;Graph.edges/1shows parent-to-child dependencies and local gradients;Graph.to_dot/2exports DOT text for optional Graphviz rendering.
What to try next
Change one variable at a time:
noise: 0.2;MLP.new(2, [8, 8, 1]);steps: 50;alpha: 0.0;h: 0.35for a faster decision-boundary grid.
For a broader set of experiments, open notebooks/micrograd_extras.livemd. It compares datasets, model sizes, regularization, learning-rate schedules, decision-boundary resolution, and a spiral dataset challenge.