MicrogradEx.Datasets (MicrogradEx v0.1.0)

Copy Markdown View Source

Small deterministic two-dimensional datasets for MicrogradEx demos.

These helpers intentionally avoid Python, sklearn, NumPy, Nx, and external ML dependencies. They return plain Elixir data structures suitable for scalar micrograd-style training and Livebook visualization.

Summary

Functions

Builds a deterministic two-class blob dataset.

Builds a deterministic two-moons classification dataset.

Builds a deterministic two-class spiral classification dataset.

Types

opts()

@type opts() :: [noise: number(), seed: seed(), shuffle: boolean()]

seed()

@type seed() :: {integer(), integer(), integer()} | nil

Functions

blobs(n_samples, opts \\ [])

Builds a deterministic two-class blob dataset.

This simple baseline is useful for sanity-checking classification and visualization code before moving to moons or spirals.

Options

  • :noise - non-negative Gaussian noise scale, default 0.1
  • :seed - deterministic random seed tuple, default {1337, 1337, 1337}
  • :shuffle - whether to deterministically shuffle rows, default true
  • :centers - two {x, y} class centers, default [{-1.0, 0.0}, {1.0, 0.0}]

Example

iex> dataset = MicrogradEx.Datasets.blobs(4, noise: 0.0, shuffle: false)
iex> length(dataset.points)
4

moons(n_samples, opts \\ [])

Builds a deterministic two-moons classification dataset.

This is a pure-Elixir equivalent of the dataset used by the official micrograd demo. Labels are returned as -1.0 and 1.0, matching the max-margin classification loss used by the training notebook.

Options

  • :noise - non-negative Gaussian noise scale, default 0.1
  • :seed - deterministic random seed tuple, default {1337, 1337, 1337}
  • :shuffle - whether to deterministically shuffle rows, default true

Example

iex> dataset = MicrogradEx.Datasets.moons(4, noise: 0.0, shuffle: false)
iex> length(dataset.xs)
4

spiral(n_samples, opts \\ [])

Builds a deterministic two-class spiral classification dataset.

The spiral dataset is useful for showing non-linear decision boundaries. Two classes share an increasing radius and are separated by a phase shift of pi.

Options

  • :noise - non-negative Gaussian noise scale, default 0.1
  • :seed - deterministic random seed tuple, default {1337, 1337, 1337}
  • :shuffle - whether to deterministically shuffle rows, default true
  • :turns - positive number of turns, default 1.5

Example

iex> dataset = MicrogradEx.Datasets.spiral(6, noise: 0.0, shuffle: false)
iex> Enum.uniq(dataset.ys)
[-1.0, 1.0]