Deep Supervision Module for TRM (Tiny-Recursive-Model) strategy.
This module provides structured prompt construction and feedback parsing for the supervision and improvement phases of the TRM recursive improvement cycle. It handles:
- Building supervision prompts for critical answer evaluation
- Parsing LLM feedback to extract issues, suggestions, and quality scores
- Building improvement prompts that incorporate feedback
- Supporting iterative refinement with previous feedback context
Overview
The TRM supervision phase takes a question and current answer, then generates critical evaluation that identifies:
- Issues with accuracy, completeness, clarity, and relevance
- Specific suggestions for improvement
- An overall quality score (0.0-1.0)
The improvement phase then applies this feedback to generate an improved answer.
Usage
# Build supervision prompt
context = %{
question: "What is machine learning?",
answer: "ML is a type of AI",
step: 1,
previous_feedback: nil
}
{system, user} = Supervision.build_supervision_prompt(context)
# Parse supervision response
feedback = Supervision.parse_supervision_result(llm_response)
# %{issues: [...], suggestions: [...], quality_score: 0.65}
# Build improvement prompt
{system, user} = Supervision.build_improvement_prompt(
context.question,
context.answer,
feedback
)
Summary
Functions
Builds the improvement prompt for applying feedback.
Builds the supervision prompt for critical answer evaluation.
Calculates the quality score from a supervision response.
Returns the default system prompt for applying feedback to improve answers.
Returns the default system prompt for critical answer supervision.
Extracts issues from a supervision response.
Extracts strengths from a supervision response.
Extracts improvement suggestions from a supervision response.
Formats the quality criteria for inclusion in prompts.
Includes previous feedback context for iterative improvement.
Parses a supervision LLM response to extract structured feedback.
Prioritizes suggestions by estimated impact.
Types
@type supervision_context() :: %{ question: String.t(), answer: String.t(), step: pos_integer(), previous_feedback: feedback() | nil }
Functions
Builds the improvement prompt for applying feedback.
Returns a tuple of {system_prompt, user_prompt} that can be used to create
an LLM directive for the improvement phase.
Parameters
question- The original questionanswer- The current answer to improvefeedback- The feedback from supervision (issues, suggestions, score)
Returns
A tuple {system_prompt, user_prompt} for generating an improved answer.
@spec build_supervision_prompt(supervision_context()) :: {String.t(), String.t()}
Builds the supervision prompt for critical answer evaluation.
Returns a tuple of {system_prompt, user_prompt} that can be used to create
an LLM directive for the supervision phase.
Parameters
context- A map containing::question- The original question being answered:answer- The current answer to evaluate:step- The current supervision step number:previous_feedback- Optional feedback from previous supervision (for iterative improvement)
Returns
A tuple {system_prompt, user_prompt} for LLM evaluation.
Examples
iex> context = %{question: "What is AI?", answer: "AI is...", step: 1, previous_feedback: nil}
iex> {system, user} = Supervision.build_supervision_prompt(context)
iex> is_binary(system) and is_binary(user)
true
Calculates the quality score from a supervision response.
First tries to extract an explicit SCORE marker. If not found, calculates a heuristic score based on the ratio of strengths to issues.
Parameters
response- The raw LLM response text
Returns
A float between 0.0 and 1.0 representing the quality score.
@spec default_improvement_system_prompt() :: String.t()
Returns the default system prompt for applying feedback to improve answers.
The prompt instructs the LLM to:
- Address all identified issues
- Implement the suggested improvements
- Preserve what was already correct
- Produce a complete, improved answer
@spec default_supervision_system_prompt() :: String.t()
Returns the default system prompt for critical answer supervision.
The prompt instructs the LLM to:
- Evaluate the answer across multiple quality dimensions
- Identify specific issues and weaknesses
- Provide actionable suggestions for improvement
- Assign a quality score from 0.0 to 1.0
Extracts issues from a supervision response.
Looks for lines starting with issue markers (ISSUE:, PROBLEM:, etc.) and returns them as a list of strings.
Extracts strengths from a supervision response.
Looks for lines starting with strength markers (STRENGTH:, CORRECT:, etc.) and returns them as a list of strings.
Extracts improvement suggestions from a supervision response.
Looks for lines starting with suggestion markers (SUGGESTION:, RECOMMEND:, etc.) and returns them as a list of strings.
@spec format_quality_criteria() :: String.t()
Formats the quality criteria for inclusion in prompts.
Lists the evaluation dimensions with brief descriptions.
Includes previous feedback context for iterative improvement.
Formats the previous feedback for inclusion in the supervision prompt, allowing the evaluator to see what was already addressed.
Parses a supervision LLM response to extract structured feedback.
Looks for formatted markers in the response:
STRENGTH:- Things done wellISSUE:- Problems identifiedSUGGESTION:- Improvement recommendationsSCORE:- Overall quality score (0.0-1.0)
Parameters
response- The raw LLM response text
Returns
A feedback map with:
:issues- List of issues identified:suggestions- List of improvement suggestions:strengths- List of things done well:quality_score- Overall score (0.0-1.0):raw_text- The original response
Examples
iex> response = "ISSUE: Missing explanation\nSUGGESTION: Add details\nSCORE: 0.6"
iex> feedback = Supervision.parse_supervision_result(response)
iex> length(feedback.issues)
1
@spec prioritize_suggestions([String.t()]) :: [prioritized_suggestion()]
Prioritizes suggestions by estimated impact.
Analyzes each suggestion to estimate its impact on answer quality, then returns them sorted from highest to lowest impact.
Parameters
suggestions- List of suggestion strings
Returns
A list of prioritized suggestion maps with :content, :impact, and :category.