HfHub Logo

HfHub

Hex.pm Documentation CI License

Elixir client for HuggingFace Hub — dataset/model metadata, file downloads, caching, and authentication. An Elixir port of Python's huggingface_hub.

hf_hub_ex provides a robust, production-ready interface to the HuggingFace Hub API, enabling Elixir applications to seamlessly access models, datasets, and spaces. This library is designed to be the foundational layer for porting Python HuggingFace libraries (datasets, evaluate, transformers) to the BEAM ecosystem.

Features

  • Hub metadata APIs for models, datasets, and Spaces
  • Downloads, snapshots, local cache helpers, and offline mode
  • Repository management: create, delete, move, settings, existence checks
  • Commit API: upload files/folders, regular payloads, Git LFS, multipart LFS
  • Git refs: branches, tags, commits, and super-squash
  • Bumblebee-style repository helpers for Elixir ML workflows
  • Structured {:ok, result} / {:error, reason} return values

Installation

def deps do
  [
    {:hf_hub, "~> 0.3.0"}
  ]
end

Then run:

mix deps.get

Guides

Start here for production-oriented usage:

Quick start

Runtime configuration

Host applications should read OS environment variables at their boundary (for example, config/runtime.exs) and pass values into :hf_hub config:

import Config

if token = System.get_env("HF_TOKEN") do
  config :hf_hub, token: token
end

if cache_dir = System.get_env("HF_HUB_CACHE") || System.get_env("HF_HOME") do
  config :hf_hub, cache_dir: cache_dir
end

if System.get_env("HF_HUB_OFFLINE") in ["1", "true", "TRUE", "yes", "YES"] do
  config :hf_hub, offline: true
end

Library calls also accept explicit token: options:

token = System.fetch_env!("HF_TOKEN")

Create a dataset repo

{:ok, repo} =
  HfHub.Repo.create(
    "my-org/my-artifact-bundle",
    repo_type: :dataset,
    private: false,
    token: token
  )

Upload a folder with LFS support

{:ok, info} =
  HfHub.Commit.upload_folder(
    "/path/to/exported_bundle",
    "my-org/my-artifact-bundle",
    repo_type: :dataset,
    token: token,
    commit_message: "v1.0.0: initial artifact bundle",
    ignore_patterns: ["*.log.jsonl", "*.tmp", ".DS_Store"]
  )

For large safetensors/model bundles, prefer conservative LFS settings:

{:ok, info} =
  HfHub.Commit.upload_folder(
    "/path/to/exported_bundle",
    "my-org/my-artifact-bundle",
    repo_type: :dataset,
    token: token,
    commit_message: "v1.0.0: initial artifact bundle",
    ignore_patterns: ["*.log.jsonl", "*.tmp", ".DS_Store"],
    max_workers: 1,
    lfs_upload_timeout: 60 * 60 * 1000,
    lfs_task_timeout: 65 * 60 * 1000
  )

See Uploads and LFS for the multipart protocol notes and operational rationale.

Tag a release

{:ok, tag} =
  HfHub.Git.create_tag(
    "my-org/my-artifact-bundle",
    "v1.0.0",
    repo_type: :dataset,
    message: "Initial public release",
    token: token
  )

This uses the Python-client-compatible endpoint shape:

POST /api/datasets/my-org/my-artifact-bundle/tag/main
{"tag":"v1.0.0","message":"Initial public release"}

Download a file

{:ok, path} =
  HfHub.Download.hf_hub_download(
    repo_id: "bert-base-uncased",
    filename: "config.json",
    repo_type: :model
  )

config = File.read!(path)

Offline/cache helpers

if HfHub.offline_mode?() do
  IO.puts("Only cached files will be used")
end

case HfHub.try_to_load_from_cache("bert-base-uncased", "config.json") do
  {:ok, path} -> File.read!(path)
  {:error, :not_cached} -> :download_or_fail
end

API overview

HfHub.Repo

Repository lifecycle helpers:

  • create/2
  • delete/2
  • update_settings/2
  • move/3
  • exists?/2
  • file_exists?/3
  • revision_exists?/3

HfHub.Commit

Commit and upload helpers:

  • create/3
  • upload_file/4
  • upload_folder/3
  • upload_large_folder/3
  • delete_file/3
  • delete_folder/3
  • matches_pattern?/2
  • needs_lfs?/1
  • lfs_threshold/0

HfHub.Git

Git refs and release helpers:

  • create_branch/3
  • delete_branch/3
  • create_tag/3
  • delete_tag/3
  • list_refs/2
  • list_commits/2
  • super_squash/2

HfHub.Download

Download and snapshot helpers:

  • hf_hub_download/1
  • snapshot_download/1
  • download_stream/1
  • resume_download/1

HfHub.Api

Hub metadata APIs:

  • model_info/2
  • dataset_info/2
  • space_info/2
  • list_models/1
  • list_datasets/1
  • list_repo_tree/2
  • list_files/2
  • dataset_configs/2
  • dataset_splits/2

Other modules

Python-client alignment

hf_hub_ex intentionally follows Python huggingface_hub route and payload shapes for the artifact-publishing surface:

  • repository IDs preserve the literal owner/name / separator in API paths;
  • branch, tag, revision, and file path segments are URL-encoded individually;
  • multipart LFS uses chunk_size, digit-only part URL keys, ETag collection, and completion POST payload %{"oid" => oid, "parts" => ...};
  • create_tag/3 posts to /tag/{revision} with payload %{"tag" => tag};
  • list_refs/2 uses include_prs=1 for pull-request refs.

Examples

The examples/ directory contains runnable scripts:

./examples/run_all.sh
mix run examples/list_datasets.exs
mix run examples/list_models.exs
mix run examples/download_file.exs
mix run examples/snapshot_download.exs
mix run examples/auth_demo.exs

See examples/README.md for details.

Testing

mix format --check-formatted
mix compile --warnings-as-errors
mix test
mix credo --strict
mix dialyzer
mix docs --warnings-as-errors

Roadmap

  • [x] Core API client (models, datasets, spaces)
  • [x] File download with caching
  • [x] Authentication support
  • [x] Repository management
  • [x] File uploads and Git LFS support
  • [x] Folder uploads with pattern filtering and batching
  • [x] Git refs/tags for artifact release workflows
  • [ ] Full endpoint-by-endpoint parity audit for every Python huggingface_hub surface
  • [ ] Inference API client
  • [ ] Integration with crucible_datasets for dataset loading

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Write tests for your changes
  4. Run the quality gates above
  5. Open a pull request

License

MIT License - See LICENSE for details.

Acknowledgments


Built with ❤️ by the North-Shore-AI team