HuggingfaceClient.Hub.Search (huggingface_client v0.1.0)

Copy Markdown View Source

Enhanced search across models, datasets, and spaces on the HuggingFace Hub.

Provides full filter support matching the Hub UI search parameters.

Example

# Find popular text-generation models in English
HuggingfaceClient.search_models(
  task: "text-generation",
  language: "en",
  sort: "downloads",
  direction: -1,
  limit: 20
)
|> Enum.each(fn m -> IO.puts("#{m["id"]}: #{m["downloads"]}") end)

# Find datasets for a specific task
HuggingfaceClient.search_datasets(
  task_categories: ["text-classification"],
  language: "en",
  size_categories: ["10K<n<100K"]
)
|> Enum.take(10)

Summary

Functions

Returns all available dataset tags.

Searches for datasets on the Hub.

Returns all available model tags (tasks, libraries, languages, etc.).

Searches for models on the Hub with full filter support.

Searches for Spaces on the Hub.

Returns trending models on the Hub.

Returns trending Spaces on the Hub.

Full-text search across all Hub content (models, datasets, spaces, papers).

Functions

dataset_tags(opts \\ [])

@spec dataset_tags(keyword()) :: {:ok, map()} | {:error, Exception.t()}

Returns all available dataset tags.

datasets(opts \\ [])

@spec datasets(keyword()) :: Enumerable.t()

Searches for datasets on the Hub.

Options

  • :search — full-text search
  • :author — filter by author
  • :task_categories — list of task categories
  • :language — language code or list
  • :multilinguality"multilingual", "monolingual"
  • :size_categories — list: ["n<1K", "1K<n<10K", "10K<n<100K", "100K<n<1M", "1M<n<10M", "n>10M"]
  • :format — data format: "parquet", "csv", "json", etc.
  • :sort — sort field: "downloads", "likes", "lastModified"
  • :direction — sort direction
  • :limit — max results
  • :access_token

Example

# Large English datasets for text classification
HuggingfaceClient.search_datasets(
  task_categories: ["text-classification"],
  language: "en",
  size_categories: ["100K<n<1M", "1M<n<10M"]
)
|> Enum.take(10)

model_tags(opts \\ [])

@spec model_tags(keyword()) :: {:ok, map()} | {:error, Exception.t()}

Returns all available model tags (tasks, libraries, languages, etc.).

Example

{:ok, tags} = HuggingfaceClient.Hub.Search.model_tags()
IO.inspect(tags["tasks"])
IO.inspect(tags["libraries"])

models(opts \\ [])

@spec models(keyword()) :: Enumerable.t()

Searches for models on the Hub with full filter support.

Returns a lazy stream of model info maps.

Options

  • :search — full-text search query
  • :author — filter by author/organization
  • :task / :pipeline_tag — filter by ML task (e.g. "text-generation", "image-classification")
  • :language — filter by language code (e.g. "en", "fr")
  • :library — filter by library (e.g. "transformers", "diffusers", "pytorch")
  • :tags — filter by tags (list or single string)
  • :dataset — filter by training dataset
  • :sort — sort field: "downloads", "likes", "lastModified", "created_at"
  • :direction — sort direction: -1 (descending) or 1 (ascending)
  • :limit — max results
  • :full — if true, return full model info (slower)
  • :cardData — if true, include model card metadata
  • :inference — filter by inference status: "warm", "cold", "frozen"
  • :gated — if true, only return gated models
  • :num_parameters — filter by parameter count range, e.g. "min:1B,max:10B"
  • :access_token

Example

# Top LLMs by downloads
HuggingfaceClient.search_models(
  task: "text-generation",
  sort: "downloads",
  direction: -1,
  limit: 50
)
|> Enum.each(fn m -> IO.puts(m["id"]) end)

# PyTorch image classifiers with >1B params
HuggingfaceClient.search_models(
  task: "image-classification",
  library: "pytorch",
  num_parameters: "min:1B"
)
|> Enum.take(20)

spaces(opts \\ [])

@spec spaces(keyword()) :: Enumerable.t()

Searches for Spaces on the Hub.

Options

  • :search — full-text search
  • :author — filter by author
  • :sdk — filter by SDK: "gradio", "streamlit", "docker", "static"
  • :tags — filter by tags
  • :sort"likes", "createdAt", "lastModified"
  • :direction — sort direction
  • :limit — max results
  • :access_token

Example

# Top Gradio demos by likes
HuggingfaceClient.search_spaces(
  sdk: "gradio",
  sort: "likes",
  direction: -1,
  limit: 20
)
|> Enum.each(fn s -> IO.puts("#{s["id"]}: #{s["likes"]} likes") end)

unified(query, opts \\ [])

@spec unified(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, Exception.t()}

Full-text search across all Hub content (models, datasets, spaces, papers).

Options

  • :query — search query (required)
  • :type — filter by type: "model", "dataset", "space", "paper" (default: all)
  • :limit — max results per type
  • :access_token

Example

{:ok, results} = HuggingfaceClient.Hub.Search.unified("BERT fine-tuning")
IO.puts("Models: #{length(results["models"])}")
IO.puts("Datasets: #{length(results["datasets"])}")