Middleware wraps the shared operation pipeline, not a single transport.
That is one of the key runtime decisions in FastestMCP. Middleware is applied to:
- in-process calls
- streamable HTTP requests
- stdio requests
- mounted provider execution
So the same policy and observability rules apply regardless of how the caller reaches the server.
Execution Model
Middleware forms a bidirectional pipeline around the operation:
request -> middleware A -> middleware B -> handler -> middleware B -> middleware A -> responseThat means middleware can:
- inspect requests
- reject requests
- rewrite behavior before the handler runs
- observe or transform results on the way back out
Adding Middleware
server =
FastestMCP.server("middleware")
|> FastestMCP.add_middleware(FastestMCP.Middleware.logging())
|> FastestMCP.add_middleware(
FastestMCP.Middleware.rate_limiting(limit: 10, interval_ms: 1_000)
)
|> FastestMCP.add_tool("echo", fn arguments, _ctx -> arguments end)Order matters. Middleware added earlier wraps middleware added later.
Built-in Middleware
FastestMCP.Middleware includes constructors for:
- logging and structured logging
- timing and detailed timing
- error normalization
- retry
- rate limiting and sliding-window rate limiting
- response caching
- response limiting
- schema dereferencing
- tool injection
- ping and session keepalive support
These constructors return configured middleware objects that can be added directly to the server definition.
Logging and Timing
Use logging and timing middleware when you want request-level observability across all operations:
FastestMCP.Middleware.logging(
include_payload_length: true,
structured_logging: true
)
FastestMCP.Middleware.timing()
FastestMCP.Middleware.detailed_timing()Read more in:
Rate Limiting and Caching
Use middleware for cross-cutting execution policy:
FastestMCP.Middleware.rate_limiting(limit: 20, interval_ms: 1_000)
FastestMCP.Middleware.sliding_window_rate_limiting(limit: 100, interval_ms: 60_000)
FastestMCP.Middleware.response_caching()
FastestMCP.Middleware.response_limiting(max_bytes: 100_000)
FastestMCP.Middleware.retry(max_retries: 3)The response cache is local to the runtime in v0.1. See Runtime State and Storage for the current storage model.
Error Handling
Error handling middleware accepts one-arity loggers that receive a message or
two-arity loggers that receive {level, message}:
FastestMCP.Middleware.error_handling(
logger: fn level, message ->
Logger.log(level, message)
end
)Explicit %FastestMCP.Error{} values can choose their log level:
raise FastestMCP.Error,
code: :invalid_params,
message: "missing required input",
log_level: :warningNormalized errors are logged without traceback noise. Unexpected exceptions
still include traceback details when include_traceback: true is configured.
Synthetic Tool Surfaces
Middleware can also inject tools into the catalog.
Generic tool injection
FastestMCP.Middleware.tool_injection([
{"multiply", fn %{"a" => a, "b" => b}, _ctx -> %{"result" => a * b} end,
[description: "Multiply two numbers."]}
])Prompt tools
FastestMCP.Middleware.prompt_tools()This injects tool equivalents for prompt listing and rendering.
Resource tools
FastestMCP.Middleware.resource_tools()This injects tool equivalents for listing and reading resources. It is the FastestMCP v0.1 answer to "tool-only clients need resource access."
Custom Middleware
Custom middleware is just a two-arity function on operations:
middleware = fn operation, next ->
if operation.method == "tools/call" and operation.target == "dangerous" do
raise FastestMCP.Error, code: :permission_denied, message: "blocked by policy"
else
next.(operation)
end
end
server =
FastestMCP.server("middleware")
|> FastestMCP.add_middleware(middleware)Use custom middleware when the behavior is about request execution, not about changing where components come from. If you are shaping component identity or provider-backed names, use Transforms instead.
Middleware vs Providers vs Transforms
Use:
- middleware for execution policy and observability
- providers for sourcing components
- transforms for reshaping component identity or filtering the catalog
Keeping those concerns separate is what makes larger composed servers easier to reason about.
Why This Shape
FastestMCP puts middleware around one shared execution path so behavior does not fork by transport.
That keeps retries, logging, rate limiting, caching, and injected tool surfaces aligned for direct calls, HTTP, stdio, and mounted providers.