Dala.ML.Preprocess (dala v0.1.1)

Copy Markdown View Source

Preprocessing pipelines for ML model inputs.

Provides standardized preprocessing for common input types: images, text, and audio. All functions return Nx tensors ready for model consumption.

Image Preprocessing

# Standard ImageNet preprocessing
tensor = image_path
         |> Dala.ML.Preprocess.load_image()
         |> Dala.ML.Preprocess.resize({224, 224})
         |> Dala.ML.Preprocess.normalize(:imagenet)
         |> Dala.ML.Preprocess.to_batch()

Audio Preprocessing

spectrogram = audio_path
              |> Dala.ML.Preprocess.load_audio()
              |> Dala.ML.Preprocess.mel_spectrogram(sample_rate: 16000)

Summary

Functions

Loads audio from a file path.

Loads an image from a file path and returns a tensor. Returns an Nx tensor of shape {height, width, 3} with values 0..255.

Computes a mel spectrogram from audio samples.

Normalizes a tensor with standard normalization schemes.

Resizes an image tensor to the target size.

Adds a batch dimension to a tensor (shape {...}{1, ...}).

Converts an Nx tensor to a binary of f32 values for ONNX input.

Functions

load_audio(path)

@spec load_audio(String.t()) ::
  {:ok, {Nx.Tensor.t(), pos_integer()}} | {:error, term()}

Loads audio from a file path.

Returns {:ok, {samples_tensor, sample_rate}}.

load_image(path)

@spec load_image(String.t()) :: {:ok, Nx.Tensor.t()} | {:error, term()}

Loads an image from a file path and returns a tensor. Returns an Nx tensor of shape {height, width, 3} with values 0..255.

mel_spectrogram(samples, opts \\ [])

@spec mel_spectrogram(
  Nx.Tensor.t(),
  keyword()
) :: Nx.Tensor.t()

Computes a mel spectrogram from audio samples.

Options

  • :sample_rate — Audio sample rate (default: 16000)
  • :n_fft — FFT size (default: 400)
  • :n_mels — Number of mel bands (default: 80)
  • :hop_length — Hop length (default: 160)

normalize(tensor, arg2)

@spec normalize(Nx.Tensor.t(), atom() | {list(), list()}) :: Nx.Tensor.t()

Normalizes a tensor with standard normalization schemes.

Schemes

  • :imagenet — ImageNet mean/std normalization
  • :minmax — Scale to [0, 1]
  • :standard — Zero mean, unit variance
  • {mean, std} — Custom normalization

resize(tensor, arg)

@spec resize(
  Nx.Tensor.t(),
  {pos_integer(), pos_integer()}
) :: Nx.Tensor.t()

Resizes an image tensor to the target size.

size is a tuple {height, width}.

to_batch(tensor)

@spec to_batch(Nx.Tensor.t()) :: Nx.Tensor.t()

Adds a batch dimension to a tensor (shape {...}{1, ...}).

to_f32_binary(tensor)

@spec to_f32_binary(Nx.Tensor.t()) :: binary()

Converts an Nx tensor to a binary of f32 values for ONNX input.