ReqLLM. Providers. FireworksAI
(ReqLLM v1.13.0)
View Source
Fireworks AI provider – OpenAI-compatible Chat Completions API.
Implementation
Uses built-in OpenAI-style encoding/decoding defaults. Fireworks exposes an
OpenAI-compatible endpoint, so request/response handling reuses the standard
OpenAI wire format. Fireworks-specific extensions are added via build_body/1.
Model identifiers follow the canonical accounts/fireworks/models/<slug> form
(e.g. fireworks_ai:accounts/fireworks/models/kimi-k2-instruct).
Fireworks-Specific Extensions
Beyond standard OpenAI parameters, Fireworks supports:
reasoning_effort- Reasoning level (none, low, medium, high, xhigh, max)prompt_cache_key/prompt_cache_isolation_key- Session affinity for KV cacheprompt_truncate_len- Truncate prompts to a specified token lengthsafe_tokenization- Prevent prompt injection via special tokensmin_p,repetition_penalty,typical_p- Extended sampling controlsmirostat_target,mirostat_lr- Mirostat sampling parametersperf_metrics_in_response- Include performance metrics in the response bodyraw_output- Return low-level model interaction detailsmetadata- Arbitrary metadata for tracing/distillationspeculation/prediction- Speculative decoding hintsparallel_tool_calls- Control concurrent tool invocationsmax_completion_tokens- Reasoning-aware token budget
Structured Output
Two strategies for generating structured objects, selected via
fireworks_structured_output_mode:
:auto(default) - Use nativeresponse_format: json_schemawith strict enforcement:json_schema- Force nativeresponse_format: json_schema:tool- Use the tool-calling workaround (compatible with older models)
Strict JSON schema enforcement (adds additionalProperties: false and marks
every property required) can be disabled with fireworks_json_schema_strict: false.
Constraints
Fireworks rejects non-streaming chat requests with max_tokens > 4096.
translate_options/3 caps max_tokens to 4096 when stream: false.
See provider_schema/0 for the complete Fireworks-specific schema and
ReqLLM.Provider.Options for inherited OpenAI parameters.
Configuration
# Add to .env file (automatically loaded)
FIREWORKS_API_KEY=fw_...
Summary
Functions
Default implementation of attach/3.
Default implementation of attach_stream/4.
Custom body building that adds Fireworks-specific extensions to the default OpenAI-compatible format.
Default implementation of decode_response/1.
Default implementation of decode_stream_event/2.
Callback implementation for ReqLLM.Provider.default_env_key/0.
Custom encode_body wrapper that delegates body construction to build_body/1
and serialises the result with the default OpenAI-compatible JSON encoder.
Default implementation of extract_usage/2.
Custom prepare_request for :object operations.
Provider-specific option normalization.
Functions
Default implementation of attach/3.
Sets up Bearer token authentication and standard pipeline steps.
Default implementation of attach_stream/4.
Builds complete streaming requests using OpenAI-compatible format.
Custom body building that adds Fireworks-specific extensions to the default OpenAI-compatible format.
Normalises tool_choice to OpenAI's function shape, strips message-level
fields Fireworks rejects on assistant turns (metadata,
reasoning_details, reasoning_content), renders reasoning_effort
atoms to the strings Fireworks expects, and forwards the full
Fireworks-specific parameter surface (sampling, prompt cache keys,
speculation, response_format, etc.). stream_options.include_usage is
added by default_build_body/1 for streaming requests.
Default implementation of decode_response/1.
Handles success/error responses with standard ReqLLM.Response creation.
Default implementation of decode_stream_event/2.
Decodes SSE events using OpenAI-compatible format.
Callback implementation for ReqLLM.Provider.default_env_key/0.
Custom encode_body wrapper that delegates body construction to build_body/1
and serialises the result with the default OpenAI-compatible JSON encoder.
Default implementation of extract_usage/2.
Extracts usage data from standard usage field in response body.
Custom prepare_request for :object operations.
Defaults to native response_format: json_schema (strict-enforced) and falls
back to the tool-calling workaround when fireworks_structured_output_mode
is :tool. All other operations delegate to the default implementation.
Provider-specific option normalization.
- Drops
:reasoning_token_budget(Fireworks usesreasoning_effortinstead). - Defaults
:receive_timeoutto 5 minutes (reasoning completions are slow). - Caps
:max_tokensto 4096 for non-streaming requests with a warning, since Fireworks rejects larger budgets whenstream: false.