Jido.Browser.WebFetch (Jido Browser v2.1.0)

Copy Markdown View Source

Stateless HTTP-first web retrieval with optional domain policy, caching, focused filtering, citation-ready passage metadata, and Extractous-backed document extraction.

This module is intended for document retrieval workloads where starting a full browser session would be unnecessary or too expensive.

Summary

Functions

Fetches a URL over HTTP(S) and returns normalized document content.

Types

result()

@type result() :: %{
  :url => String.t(),
  :final_url => String.t(),
  :content => String.t(),
  :format => atom(),
  :content_type => String.t(),
  :document_type => atom(),
  :retrieved_at => String.t(),
  :estimated_tokens => non_neg_integer(),
  :original_estimated_tokens => non_neg_integer(),
  :truncated => boolean(),
  :filtered => boolean(),
  :focus_matches => non_neg_integer(),
  :cached => boolean(),
  :citations => %{enabled: boolean()},
  :passages => [map()],
  optional(:title) => String.t() | nil,
  optional(:metadata) => map()
}

Functions

fetch(url, opts \\ [])

@spec fetch(
  String.t(),
  keyword()
) :: {:ok, result()} | {:error, Exception.t()}

Fetches a URL over HTTP(S) and returns normalized document content.

Supported options:

  • :format - :markdown, :text, or :html
  • :selector - CSS selector for HTML pages
  • :allowed_domains / :blocked_domains - mutually exclusive host/path rules
  • :max_content_tokens - approximate token cap
  • :citations - boolean, when true include passage spans
  • :focus_terms - list of terms used for focused filtering
  • :focus_window - paragraph window around focus matches
  • :timeout - receive timeout in milliseconds
  • :cache - enable ETS cache, defaults to true
  • :cache_ttl_ms - cache TTL in milliseconds
  • :require_known_url / :known_urls - optional URL provenance guard
  • :extractous - optional ExtractousEx keyword options merged with config