HuggingFace Training Stack — configuration helpers for fine-tuning and training.
Provides structured configuration builders for:
- 3.1 AutoTrain — no-code training via the AutoTrain API
- 3.2 Trainer API — Transformers Trainer configuration (TrainingArguments)
- 3.3 Fine-tuning techniques — LoRA, PEFT, full fine-tuning, DPO, ORPO, SFT
These helpers generate configuration maps that can be:
- Passed to
HuggingfaceClient.autotrain_create/1to launch training on HF infra - Passed to
HuggingfaceClient.run_job/1to run a custom training script - Serialized to JSON/YAML for use in local training scripts
Example
# Launch LoRA fine-tuning via AutoTrain
config = HuggingfaceClient.Training.lora_config(
base_model: "meta-llama/Llama-3.1-8B",
dataset: "my-org/my-dataset",
rank: 16, alpha: 32, epochs: 3
)
{:ok, project} = HuggingfaceClient.autotrain_create(
Map.merge(config, %{project_name: "my-lora-model", access_token: token})
)
# Or run a training job on GPU infra
{:ok, job} = HuggingfaceClient.run_job(
image: "huggingface/transformers-pytorch-gpu:latest",
command: ["python", "train.py"] ++ HuggingfaceClient.Training.to_args(config),
flavor: "a10g-small",
access_token: token
)
Summary
Functions
Builds an Accelerate launch configuration for distributed training.
Builds a DPO (Direct Preference Optimization) configuration.
Builds a full fine-tuning configuration (no PEFT, all parameters trained).
Builds a LoRA (Low-Rank Adaptation) configuration map.
Builds an ORPO (Odds Ratio Preference Optimization) configuration.
Returns a pre-configured recipe for common fine-tuning scenarios.
Builds a Reward Model training configuration.
Converts an accelerate config to accelerate launch CLI arguments.
Converts a training config map into CLI argument list form.
Builds a TrainingArguments-compatible configuration map.
Functions
Builds an Accelerate launch configuration for distributed training.
This mirrors accelerate config / accelerate launch parameters.
Options
:num_processes— total number of processes (GPUs) (default: 1):num_machines— number of machines/nodes (default: 1):machine_rank— this machine's rank (default: 0):mixed_precision—"no","fp16","bf16","fp8"(default:"no"):distributed_type—"NO","MULTI_GPU","DEEPSPEED","FSDP","TPU"(default:"NO"):deepspeed_config— path to DeepSpeed config JSON:fsdp_config— FSDP config map:gradient_accumulation_steps— steps between optimizer updates (default: 1)
Example
config = HuggingfaceClient.Training.accelerate_config(
num_processes: 4,
mixed_precision: "bf16",
distributed_type: "MULTI_GPU"
)
# Run on 4× A10G GPU job
{:ok, job} = HuggingfaceClient.run_job(
image: "huggingface/transformers-pytorch-gpu:latest",
command: ["accelerate", "launch"] ++
HuggingfaceClient.Training.to_accelerate_args(config) ++
["train.py"],
flavor: "a10g-largex4",
access_token: token
)
Builds a DPO (Direct Preference Optimization) configuration.
DPO is used to align LLMs with human preferences without a reward model.
Options
:base_model— SFT-trained model to align (required):dataset— preference dataset with chosen/rejected pairs (required):beta— KL divergence coefficient (default: 0.1):max_length— max total sequence length (default: 1024):max_prompt_length— max prompt length (default: 512):epochs— training epochs (default: 1):batch_size— per-device batch size (default: 2):learning_rate— learning rate (default: 5.0e-7):use_peft— use LoRA for DPO (default:true):lora_r— LoRA rank if using PEFT (default: 16)
Builds a full fine-tuning configuration (no PEFT, all parameters trained).
Use this when you have sufficient GPU memory and want maximum model capacity.
Options
:base_model— HF model to fine-tune (required):dataset— training dataset (required):task— task type (default:"llm-sft"):epochs— training epochs (default: 1):batch_size— per-device batch size (default: 1):learning_rate— learning rate (default: 1.0e-5):max_seq_length— max token length (default: 2048):fp16/:bf16— mixed precision training:gradient_checkpointing— save memory at compute cost (default:true):gradient_accumulation_steps— accumulate gradients (default: 4)
Builds a LoRA (Low-Rank Adaptation) configuration map.
LoRA is the most popular parameter-efficient fine-tuning technique. It freezes the base model and adds small trainable rank-decomposition matrices.
Options
:base_model— HF model ID to fine-tune (required for AutoTrain):dataset— HF dataset ID (required for AutoTrain):rank/:r— LoRA rank (default: 16). Higher = more params, more capacity.:alpha/:lora_alpha— LoRA scaling factor (default: 32). Usually 2× rank.:dropout/:lora_dropout— dropout probability (default: 0.05):target_modules— list of module names to apply LoRA to (default:["q_proj", "v_proj"]for most LLMs):bias—"none","all","lora_only"(default:"none"):task_type—"CAUSAL_LM","SEQ_CLS","SEQ_2_SEQ_LM"(default:"CAUSAL_LM"):use_rslora— use rank-stabilized LoRA (default:false):use_dora— use weight-decomposed LoRA (default:false):epochs— training epochs (default: 3):batch_size— per-device batch size (default: 2):learning_rate— learning rate (default: 2.0e-4):max_seq_length— max token length (default: 2048):use_4bit— use 4-bit quantization base (default:false):use_8bit— use 8-bit quantization base (default:false)
Example
config = HuggingfaceClient.Training.lora_config(
base_model: "meta-llama/Llama-3.1-8B-Instruct",
rank: 16, alpha: 32,
target_modules: ["q_proj", "k_proj", "v_proj", "o_proj"],
epochs: 3, batch_size: 4, learning_rate: 2.0e-4,
max_seq_length: 2048, use_4bit: true
)
# With AutoTrain
{:ok, project} = HuggingfaceClient.autotrain_create(
Map.merge(config, %{
project_name: "llama-lora-ft",
task: "llm-sft",
dataset: "my-org/my-dataset",
access_token: token
})
)
Builds an ORPO (Odds Ratio Preference Optimization) configuration.
ORPO combines SFT and alignment in a single training pass — more efficient than DPO.
Options
:base_model— base model to train from scratch (required):dataset— preference dataset (required):orpo_alpha— ORPO alpha parameter (default: 0.1):max_length— max sequence length (default: 1024):epochs— training epochs (default: 3):batch_size— batch size (default: 2):learning_rate— learning rate (default: 8.0e-6)
Returns a pre-configured recipe for common fine-tuning scenarios.
Recipes
:llama3_lora— Llama 3 LoRA SFT (A10G-class GPU, 4-bit base):mistral_lora— Mistral LoRA SFT (A10G-class GPU, 4-bit base):bert_text_classification— BERT-style text classification fine-tuning:t5_summarization— T5/FLAN summarization fine-tuning:vit_image_classification— ViT image classification fine-tuning:whisper_asr— Whisper ASR fine-tuning
Example
config = HuggingfaceClient.Training.recipe(:llama3_lora,
base_model: "meta-llama/Llama-3.1-8B-Instruct",
dataset: "my-org/my-chat-data"
)
Builds a Reward Model training configuration.
Used in RLHF pipelines to train a reward model from preference data.
Options
:base_model— backbone model (required):dataset— preference dataset with chosen/rejected (required):max_length— max sequence length (default: 512):epochs— training epochs (default: 1):batch_size— batch size (default: 4):learning_rate— learning rate (default: 1.0e-5)
Converts an accelerate config to accelerate launch CLI arguments.
Converts a training config map into CLI argument list form.
Example
args = HuggingfaceClient.Training.to_args(config)
# ["--learning_rate", "2e-4", "--num_train_epochs", "3", ...]
{:ok, job} = HuggingfaceClient.run_job(
image: "pytorch/pytorch:latest",
command: ["python", "train.py"] ++ args,
flavor: "a10g-small"
)
Builds a TrainingArguments-compatible configuration map.
This mirrors the HuggingFace transformers.TrainingArguments parameters.
Options
:output_dir— where to save the model (required):num_train_epochs— number of training epochs (default: 3):per_device_train_batch_size— batch size per device (default: 8):per_device_eval_batch_size— eval batch size (default: 8):learning_rate— initial learning rate (default: 5.0e-5):weight_decay— weight decay coefficient (default: 0.0):warmup_steps— number of warmup steps (default: 0):warmup_ratio— fraction of steps for warmup:lr_scheduler_type—"linear","cosine","cosine_with_restarts","polynomial","constant","constant_with_warmup"(default:"linear"):evaluation_strategy—"no","steps","epoch"(default:"epoch"):save_strategy—"no","steps","epoch"(default:"epoch"):save_total_limit— max checkpoints to keep:load_best_model_at_end— load best checkpoint after training (default:false):metric_for_best_model— metric name for best model selection:fp16— use FP16 mixed precision (default:false):bf16— use BF16 mixed precision (default:false):gradient_accumulation_steps— steps before optimizer update (default: 1):gradient_checkpointing— trade compute for memory (default:false):dataloader_num_workers— number of data loader workers (default: 0):seed— random seed (default: 42):hub_model_id— push checkpoints to this HF model ID:push_to_hub— auto-push checkpoints to Hub (default:false):report_to—"tensorboard","wandb","none"(default:"none"):logging_steps— log every N steps (default: 500):eval_steps— evaluate every N steps (if strategy is "steps"):save_steps— save every N steps (if strategy is "steps"):max_steps— override epochs with max steps (-1 = use epochs):optim— optimizer:"adamw_torch","adamw_hf","sgd","adafactor"(default:"adamw_torch")
Example
config = HuggingfaceClient.Training.training_args(
output_dir: "./my-model",
num_train_epochs: 5,
per_device_train_batch_size: 16,
learning_rate: 2.0e-5,
fp16: true,
hub_model_id: "my-org/my-finetuned-model",
push_to_hub: true
)