Sagents.Middleware.Summarization (Sagents v0.8.0-rc.3)
Copy MarkdownMiddleware that automatically manages conversation length through intelligent summarization.
This middleware monitors token usage and automatically summarizes older messages when a threshold is exceeded, preserving recent messages for context continuity.
Purpose
Long conversations present several problems:
- Increased API costs
- Slower response times
- Risk of exceeding model context limits
- Potential API errors
This middleware solves these problems by:
- Monitoring total token count
- Summarizing older messages when threshold is exceeded
- Preserving recent messages for continuity
- Protecting AI/Tool message pairs from separation
Configuration
# Default configuration
{Summarization, []}
# Custom configuration
{Summarization, [
model: custom_model, # Model for summarization (defaults to agent model)
max_tokens_before_summary: 170_000, # Token threshold (default: 170k)
messages_to_keep: 6, # Recent messages to preserve (default: 6)
summary_prompt: custom_prompt, # Custom summarization prompt
token_counter: &custom_counter/1 # Custom token counting function
]}Configuration Options
:model- LLM to use for summarization. Defaults to the agent's model.:max_tokens_before_summary- Token threshold that triggers summarization. Default: 170,000:messages_to_keep- Number of recent messages to preserve intact. Default: 6:summary_prompt- Custom prompt for summarization. Uses intelligent default.:token_counter- Function to count tokens. Defaults to approximate counting.
Position in Middleware Stack
Should run relatively early in before_model phase, after message generation but before any processing that expects specific message structures:
- TodoListMiddleware
- FilesystemMiddleware
- SubAgentMiddleware
- SummarizationMiddleware ← Position
- AnthropicPromptCachingMiddleware
- PatchToolCallsMiddleware
- HumanInTheLoopMiddleware
How It Works
1. Token Monitoring
Before each model call, counts total tokens in message history.
2. Threshold Check
If tokens exceed threshold, triggers summarization.
3. Safe Cutoff Detection
Finds safe points to cut the conversation that don't separate:
- Assistant messages with tool_calls from their corresponding tool results
- Related message pairs
4. Message Partitioning
- To summarize: Older messages before cutoff point
- To preserve: Recent messages after cutoff point
5. Summary Generation
Uses LLM to generate concise summary of older messages.
6. State Update
Replaces older messages with summary messages, preserving recent messages.
Example
# Create agent with summarization
{:ok, agent} = Agent.new(
model: model,
middleware: [
{Summarization, [
max_tokens_before_summary: 150_000,
messages_to_keep: 8
]}
]
)
# Summarization happens automatically during execution
{:ok, state} = Agent.execute(agent, state)Safe Cutoff Algorithm
The middleware protects AI/Tool message pairs from separation:
- Calculate target cutoff:
message_count - messages_to_keep - Search backwards from target to find safe cutoff point
- A point is safe if:
- It's not an assistant message with tool_calls
- The next message isn't a tool result for this assistant
- If no safe point found, summarize nothing (keeps all messages)
Error Handling
- Falls back to keeping all messages if summarization fails
- Logs errors but doesn't halt agent execution
- Graceful degradation ensures agent continues working
Performance Considerations
- Token counting is approximate (fast estimation)
- Summarization only runs when threshold exceeded
- Summary generation is async-compatible
- Minimal overhead when under threshold