ExBurn.Dataset (ex_burn v0.3.1)

Copy Markdown View Source

Dataset utilities for ExBurn.

Provides common data loading, splitting, and preprocessing helpers for machine learning workflows.

Usage

# Split data into train/validation sets
{train, val} = ExBurn.Dataset.split({x, y}, val_split: 0.2)

# Create a batched data loader
loader = ExBurn.Dataset.loader({x, y}, batch_size: 32, shuffle: true)

# Normalize features
{normalized, stats} = ExBurn.Dataset.normalize(x, method: :standard)

Summary

Functions

Creates a batched data loader from a dataset.

Normalizes a tensor using the specified method.

Normalizes a tensor using pre-computed statistics.

Applies one-hot encoding to integer class labels.

Splits a dataset into training and validation sets.

Returns basic statistics about a dataset.

Types

dataset()

@type dataset() :: {Nx.Tensor.t(), Nx.Tensor.t()}

Functions

loader(arg, opts \\ [])

@spec loader(
  dataset(),
  keyword()
) :: Enumerable.t()

Creates a batched data loader from a dataset.

Returns a Stream of {batch_inputs, batch_targets} tuples.

Options

  • :batch_size — Batch size (default: 32)
  • :shuffle — Shuffle data each epoch (default: true)
  • :drop_last — Drop the last incomplete batch (default: false)
  • :seed — Random seed for shuffling

Example

ExBurn.Dataset.loader({x, y}, batch_size: 64)
|> Enum.each(fn {batch_x, batch_y} ->
  # process batch
end)

normalize(tensor, opts \\ [])

@spec normalize(
  Nx.Tensor.t(),
  keyword()
) :: {Nx.Tensor.t(), map()}

Normalizes a tensor using the specified method.

Options

  • :method — Normalization method: :standard (z-score), :minmax, or :l2 (default: :standard)
  • :axes — Axes to compute statistics over (default: [0])

Returns

{normalized_tensor, stats_map} where stats can be used to normalize new data with normalize_with_stats/3.

Example

{train_norm, stats} = ExBurn.Dataset.normalize(train_x, method: :standard)
test_norm = ExBurn.Dataset.normalize_with_stats(test_x, stats)

normalize_with_stats(tensor, map)

@spec normalize_with_stats(Nx.Tensor.t(), map()) :: Nx.Tensor.t()

Normalizes a tensor using pre-computed statistics.

Useful for applying the same normalization to test/validation data.

Example

{train_norm, stats} = ExBurn.Dataset.normalize(train_x)
test_norm = ExBurn.Dataset.normalize_with_stats(test_x, stats)

one_hot(labels, opts \\ [])

@spec one_hot(
  Nx.Tensor.t(),
  keyword()
) :: Nx.Tensor.t()

Applies one-hot encoding to integer class labels.

Parameters

  • labels — 1D tensor of integer class indices
  • num_classes — Total number of classes

Returns

2D tensor of one-hot encoded labels.

Example

one_hot = ExBurn.Dataset.one_hot(Nx.tensor([0, 2, 1]), num_classes: 3)
# [[1.0, 0.0, 0.0], [0.0, 0.0, 1.0], [0.0, 1.0, 0.0]]

split(arg, opts \\ [])

@spec split(
  dataset(),
  keyword()
) :: {dataset(), dataset()}

Splits a dataset into training and validation sets.

Options

  • :val_split — Fraction of data for validation (default: 0.2)
  • :shuffle — Shuffle before splitting (default: true)
  • :seed — Random seed for reproducibility (default: nil)

Returns

{train_data, val_data} where each is {inputs, targets}.

Example

{train, val} = ExBurn.Dataset.split({x, y}, val_split: 0.2, seed: 42)

stats(arg)

@spec stats(dataset()) :: map()

Returns basic statistics about a dataset.

Returns

A map with :num_samples, :input_shape, :target_shape, :input_type, :target_type.

Example

s = ExBurn.Dataset.stats({x, y})
IO.puts("Samples: #{s.num_samples}")