BinClass.Trainer (BinClass v0.2.0)

Copy Markdown View Source

Summary

Functions

Trains a binary classifier on the given data stream.

Functions

train(data_stream, opts \\ [])

Trains a binary classifier on the given data stream.

Options

  • :epochs - Number of training epochs. Defaults to 10.
  • :batch_size - Batch size for training. Defaults to 32.
  • :learning_rate - Initial learning rate. Defaults to 1.0e-3.
  • :decay - Learning rate decay. Defaults to 1.0e-2.
  • :labels - Mapping of labels. Can be a list or a map. Defaults to [0, 1].
  • :validation_split - Fraction of data to use for validation. Defaults to 0.1.
  • :patience - Number of epochs to wait for improvement before early stopping. Defaults to 5.
  • :compiler - The Nx compiler to use. Defaults to EXLA.
  • :model_version - The architecture version to use. Defaults to conservative_cnn.
  • :tune - If true, performs automatic hyperparameter tuning for learning rate and dropout. Defaults to false.
  • :dropout_rate - Dropout rate for the model (ignored if :tune is true). Defaults to 0.2.
  • :false_positive_penalty - Penalty multiplier applied to validation false-positive rate when selecting checkpoints and tuned hyperparameters. Defaults to 0.5 for v7 and 0.0 for older models.
  • :calibrate_threshold - If true, calibrates and persists a positive threshold on the validation split. Defaults to true for v7 and false for older models.
  • :threshold_candidates - Thresholds to evaluate during calibration. Defaults to 0.5..0.9 in 0.05 increments.
  • :vector_length - Fixed sequence length for tokenization. Defaults to 512.
  • :tokenizer_data - Custom data stream to train the tokenizer. Defaults to the :text field of data_stream.