ReqLLM.Providers.FireworksAI (ReqLLM v1.13.0)

View Source

Fireworks AI provider – OpenAI-compatible Chat Completions API.

Implementation

Uses built-in OpenAI-style encoding/decoding defaults. Fireworks exposes an OpenAI-compatible endpoint, so request/response handling reuses the standard OpenAI wire format. Fireworks-specific extensions are added via build_body/1.

Model identifiers follow the canonical accounts/fireworks/models/<slug> form (e.g. fireworks_ai:accounts/fireworks/models/kimi-k2-instruct).

Fireworks-Specific Extensions

Beyond standard OpenAI parameters, Fireworks supports:

  • reasoning_effort - Reasoning level (none, low, medium, high, xhigh, max)
  • prompt_cache_key / prompt_cache_isolation_key - Session affinity for KV cache
  • prompt_truncate_len - Truncate prompts to a specified token length
  • safe_tokenization - Prevent prompt injection via special tokens
  • min_p, repetition_penalty, typical_p - Extended sampling controls
  • mirostat_target, mirostat_lr - Mirostat sampling parameters
  • perf_metrics_in_response - Include performance metrics in the response body
  • raw_output - Return low-level model interaction details
  • metadata - Arbitrary metadata for tracing/distillation
  • speculation / prediction - Speculative decoding hints
  • parallel_tool_calls - Control concurrent tool invocations
  • max_completion_tokens - Reasoning-aware token budget

Structured Output

Two strategies for generating structured objects, selected via fireworks_structured_output_mode:

  • :auto (default) - Use native response_format: json_schema with strict enforcement
  • :json_schema - Force native response_format: json_schema
  • :tool - Use the tool-calling workaround (compatible with older models)

Strict JSON schema enforcement (adds additionalProperties: false and marks every property required) can be disabled with fireworks_json_schema_strict: false.

Constraints

Fireworks rejects non-streaming chat requests with max_tokens > 4096. translate_options/3 caps max_tokens to 4096 when stream: false.

See provider_schema/0 for the complete Fireworks-specific schema and ReqLLM.Provider.Options for inherited OpenAI parameters.

Configuration

# Add to .env file (automatically loaded)
FIREWORKS_API_KEY=fw_...

Summary

Functions

Default implementation of attach/3.

Default implementation of attach_stream/4.

Custom body building that adds Fireworks-specific extensions to the default OpenAI-compatible format.

Default implementation of decode_response/1.

Default implementation of decode_stream_event/2.

Custom encode_body wrapper that delegates body construction to build_body/1 and serialises the result with the default OpenAI-compatible JSON encoder.

Default implementation of extract_usage/2.

Custom prepare_request for :object operations.

Provider-specific option normalization.

Functions

attach(request, model_input, user_opts)

Default implementation of attach/3.

Sets up Bearer token authentication and standard pipeline steps.

attach_stream(model, context, opts, finch_name)

Default implementation of attach_stream/4.

Builds complete streaming requests using OpenAI-compatible format.

base_url()

build_body(request)

Custom body building that adds Fireworks-specific extensions to the default OpenAI-compatible format.

Normalises tool_choice to OpenAI's function shape, strips message-level fields Fireworks rejects on assistant turns (metadata, reasoning_details, reasoning_content), renders reasoning_effort atoms to the strings Fireworks expects, and forwards the full Fireworks-specific parameter surface (sampling, prompt cache keys, speculation, response_format, etc.). stream_options.include_usage is added by default_build_body/1 for streaming requests.

decode_response(request_response)

Default implementation of decode_response/1.

Handles success/error responses with standard ReqLLM.Response creation.

decode_stream_event(event, model)

Default implementation of decode_stream_event/2.

Decodes SSE events using OpenAI-compatible format.

default_base_url()

default_env_key()

Callback implementation for ReqLLM.Provider.default_env_key/0.

encode_body(request)

Custom encode_body wrapper that delegates body construction to build_body/1 and serialises the result with the default OpenAI-compatible JSON encoder.

extract_usage(body, model)

Default implementation of extract_usage/2.

Extracts usage data from standard usage field in response body.

pre_validate_options(operation, model, opts)

prepare_request(operation, model_spec, input, opts)

Custom prepare_request for :object operations.

Defaults to native response_format: json_schema (strict-enforced) and falls back to the tool-calling workaround when fireworks_structured_output_mode is :tool. All other operations delegate to the default implementation.

provider_extended_generation_schema()

provider_id()

provider_schema()

supported_provider_options()

translate_options(operation, model, opts)

Provider-specific option normalization.

  • Drops :reasoning_token_budget (Fireworks uses reasoning_effort instead).
  • Defaults :receive_timeout to 5 minutes (reasoning completions are slow).
  • Caps :max_tokens to 4096 for non-streaming requests with a warning, since Fireworks rejects larger budgets when stream: false.