MCP Streamable HTTP Deployment

Copy Markdown View Source

ptc_runner_mcp can run as an opt-in Streamable HTTP MCP server for private-network deployments. The HTTP mode is intended for a trusted agentic application or load balancer talking to one BEAM node over a small network boundary; stdio remains the default local-client mode.

Quick Start

Generate a strong bearer token:

export PTC_RUNNER_MCP_HTTP_AUTH_TOKEN="$(openssl rand -base64 32)"

Start the release on loopback:

_build/prod/rel/ptc_runner_mcp/bin/ptc_runner_mcp start \
  --http \
  --http-auth-token "$PTC_RUNNER_MCP_HTTP_AUTH_TOKEN"

The MCP endpoint is POST /mcp on 127.0.0.1:7332 by default. In HTTP mode stdio is not attached.

Required Headers

Every /mcp request needs:

Authorization: Bearer <token>

The first request is initialize without an MCP-Session-Id. The response includes:

MCP-Session-Id: <opaque id>

Subsequent POST and DELETE requests send that session id. Clients should also send MCP-Protocol-Version; if it is absent the server falls back to the version stored on the HTTP session.

Health And Readiness

GET /health is liveness: it returns 200 while the process and listener are alive.

GET /ready is load-balancer readiness: it returns 200 when the session registry is accepting work, and 503 when the node is draining or saturated.

Both endpoints are unauthenticated and expose only process status. Keep them on a private network or restrict them at the load balancer.

Security Posture

Use a private subnet, security group, firewall, or equivalent network boundary. The bearer token is the inner boundary, not the only boundary.

TLS should terminate at the load balancer or edge proxy. If traffic from the proxy to the BEAM node is plaintext, anything able to observe that link can see the bearer token.

Binding to a non-loopback address always requires a token. --http-disable-auth is only permitted on loopback binds and cannot be combined with --http-allow-unsafe-network.

The v1 auth model has one static token. All token holders are the same owner, so an MCP-Session-Id is a bearer capability inside that trust boundary. Session ids are random and are never logged raw.

Failed-auth rate limiting

Repeated failed bearer authentications from the same source are rate limited to reduce brute-force risk (OPLANE_REQ-00080322). Both missing and invalid bearer failures count. Once a source exceeds --http-auth-rate-limit-max-failures within --http-auth-rate-limit-window-ms, it is temporarily blocked for --http-auth-rate-limit-block-ms; blocked requests receive 429 with a Retry-After header and no www-authenticate challenge (so a blocked caller is not told whether its token was missing or invalid). A successful authentication immediately resets the source's failure state. A misconfigured-but-legitimate client can briefly be blocked; it self-heals after the failure window elapses (not just the block window — if block_ms < window_ms, the failure count persists across block expiry until the window resets, so the effective recovery time is window_ms). The limiter is on by default and is a no-op when no token is configured or when --http-auth-rate-limit is false; if the limiter process is unavailable it fails open.

The source is keyed by the peer IP (conn.remote_ip). X-Forwarded-For / Forwarded headers are not trusted — behind a reverse proxy every request appears to originate from the proxy, so deploy per-source limiting at the proxy too (or in addition). Host/Origin rejections are not bearer-auth failures and never count toward the limit. Block decisions log a JSON line (http_auth_rate_limited) with the raw source IP, which is operationally useful for acting on abuse; the bearer token itself is never logged.

Token generation

The server requires a minimum of 32 bytes but does not programmatically verify entropy quality. Generate tokens with a cryptographically secure source:

# Recommended — 32 random bytes, base64-encoded (44 characters)
openssl rand -base64 32

# Alternative
python3 -c "import secrets; print(secrets.token_urlsafe(32))"

Do not use low-entropy strings (repeated characters, dictionary words, predictable patterns). Entropy quality is the caller's responsibility.

Redaction

The bearer token is registered with the credential redaction system at startup. If the raw token value accidentally appears in a log line or trace payload, the redactor replaces it with [REDACTED]. This is a defense-in-depth measure — the primary protection is that request logs and traces use hashed owner identifiers, never the raw token.

Observability

HTTP request logs are JSON lines on stderr. Each request log includes an instance label, X-Request-Id, method, path, status, duration, and hashed owner/session ids when known.

The server emits sanitized telemetry under [:ptc_lisp, :http, ...] for request start/stop, session create/close, auth failures, failed-auth rate-limit blocks ([:ptc_lisp, :http, :auth, :rate_limited]), limit rejections, and cancellations.

When --trace-dir is enabled, HTTP tool-call trace events include owner_hash, mcp_session_hash, and transport_request_id metadata. Trace payload policy is still controlled by --trace-payloads.

Prometheus /metrics is not shipped in this PR. The config flags are reserved, but scraping support remains a follow-up.

Rolling Deploy Drain

On application shutdown the registry enters drain mode, so /ready returns 503 and new /mcp POSTs receive a JSON-RPC server draining error. The process waits --http-shutdown-grace-ms, then cancels any remaining in-flight HTTP workers before exit.

DELETE /mcp closes one protocol session immediately. Reaped or deleted sessions return 404 on later use.

Important Flags

FlagDefaultMeaning
--httpfalseEnable the HTTP listener.
--http-host127.0.0.1Bind IP address or localhost.
--http-port7332Bind port.
--http-path/mcpMCP endpoint path.
--http-auth-tokenunsetStatic bearer token; minimum 32 characters.
--http-allowed-originunsetExact allowed browser Origin; repeatable or comma-separated.
--http-max-sessions256Global HTTP protocol-session cap.
--http-max-sessions-per-owner32Per-owner protocol-session cap.
--http-max-in-flight-per-session4Per-session executing request cap.
--http-auth-rate-limittrueRate limit failed bearer auth per source.
--http-auth-rate-limit-window-ms60000Window over which failures accumulate.
--http-auth-rate-limit-max-failures5Failures per window before a source is blocked.
--http-auth-rate-limit-block-ms60000Block duration after the threshold is exceeded (429 + Retry-After).
--http-session-ttl-ms3600000Absolute protocol-session lifetime.
--http-session-idle-timeout-ms900000Idle protocol-session timeout.
--http-instance-labelhostnameLabel stamped into HTTP logs/telemetry/traces.

For loopback binds, the MCP endpoint rejects requests whose Host/authority is not loopback. Missing Origin remains valid for non-browser clients, but invalid browser Origin values are rejected. POST requests with a present Content-Type must be JSON (application/json or a +json media type).