Dataset utilities for ExBurn.
Provides common data loading, splitting, and preprocessing helpers for machine learning workflows.
Usage
# Split data into train/validation sets
{train, val} = ExBurn.Dataset.split({x, y}, val_split: 0.2)
# Create a batched data loader
loader = ExBurn.Dataset.loader({x, y}, batch_size: 32, shuffle: true)
# Normalize features
{normalized, stats} = ExBurn.Dataset.normalize(x, method: :standard)
Summary
Functions
Creates a batched data loader from a dataset.
Normalizes a tensor using the specified method.
Normalizes a tensor using pre-computed statistics.
Applies one-hot encoding to integer class labels.
Splits a dataset into training and validation sets.
Returns basic statistics about a dataset.
Types
@type dataset() :: {Nx.Tensor.t(), Nx.Tensor.t()}
Functions
@spec loader( dataset(), keyword() ) :: Enumerable.t()
Creates a batched data loader from a dataset.
Returns a Stream of {batch_inputs, batch_targets} tuples.
Options
:batch_size— Batch size (default: 32):shuffle— Shuffle data each epoch (default: true):drop_last— Drop the last incomplete batch (default: false):seed— Random seed for shuffling
Example
ExBurn.Dataset.loader({x, y}, batch_size: 64)
|> Enum.each(fn {batch_x, batch_y} ->
# process batch
end)
@spec normalize( Nx.Tensor.t(), keyword() ) :: {Nx.Tensor.t(), map()}
Normalizes a tensor using the specified method.
Options
:method— Normalization method::standard(z-score),:minmax, or:l2(default::standard):axes— Axes to compute statistics over (default: [0])
Returns
{normalized_tensor, stats_map} where stats can be used to normalize
new data with normalize_with_stats/3.
Example
{train_norm, stats} = ExBurn.Dataset.normalize(train_x, method: :standard)
test_norm = ExBurn.Dataset.normalize_with_stats(test_x, stats)
@spec normalize_with_stats(Nx.Tensor.t(), map()) :: Nx.Tensor.t()
Normalizes a tensor using pre-computed statistics.
Useful for applying the same normalization to test/validation data.
Example
{train_norm, stats} = ExBurn.Dataset.normalize(train_x)
test_norm = ExBurn.Dataset.normalize_with_stats(test_x, stats)
@spec one_hot( Nx.Tensor.t(), keyword() ) :: Nx.Tensor.t()
Applies one-hot encoding to integer class labels.
Parameters
labels— 1D tensor of integer class indicesnum_classes— Total number of classes
Returns
2D tensor of one-hot encoded labels.
Example
one_hot = ExBurn.Dataset.one_hot(Nx.tensor([0, 2, 1]), num_classes: 3)
# [[1.0, 0.0, 0.0], [0.0, 0.0, 1.0], [0.0, 1.0, 0.0]]
Splits a dataset into training and validation sets.
Options
:val_split— Fraction of data for validation (default: 0.2):shuffle— Shuffle before splitting (default: true):seed— Random seed for reproducibility (default: nil)
Returns
{train_data, val_data} where each is {inputs, targets}.
Example
{train, val} = ExBurn.Dataset.split({x, y}, val_split: 0.2, seed: 42)
Returns basic statistics about a dataset.
Returns
A map with :num_samples, :input_shape, :target_shape,
:input_type, :target_type.
Example
s = ExBurn.Dataset.stats({x, y})
IO.puts("Samples: #{s.num_samples}")