Jido.AI.Reasoning.TRM.Supervision (Jido AI v2.2.0)

Copy Markdown View Source

Deep Supervision Module for TRM (Tiny-Recursive-Model) strategy.

This module provides structured prompt construction and feedback parsing for the supervision and improvement phases of the TRM recursive improvement cycle. It handles:

  • Building supervision prompts for critical answer evaluation
  • Parsing LLM feedback to extract issues, suggestions, and quality scores
  • Building improvement prompts that incorporate feedback
  • Supporting iterative refinement with previous feedback context

Overview

The TRM supervision phase takes a question and current answer, then generates critical evaluation that identifies:

  • Issues with accuracy, completeness, clarity, and relevance
  • Specific suggestions for improvement
  • An overall quality score (0.0-1.0)

The improvement phase then applies this feedback to generate an improved answer.

Usage

# Build supervision prompt
context = %{
  question: "What is machine learning?",
  answer: "ML is a type of AI",
  step: 1,
  previous_feedback: nil
}

{system, user} = Supervision.build_supervision_prompt(context)

# Parse supervision response
feedback = Supervision.parse_supervision_result(llm_response)
# %{issues: [...], suggestions: [...], quality_score: 0.65}

# Build improvement prompt
{system, user} = Supervision.build_improvement_prompt(
  context.question,
  context.answer,
  feedback
)

Summary

Functions

Builds the improvement prompt for applying feedback.

Builds the supervision prompt for critical answer evaluation.

Calculates the quality score from a supervision response.

Returns the default system prompt for applying feedback to improve answers.

Returns the default system prompt for critical answer supervision.

Extracts issues from a supervision response.

Extracts strengths from a supervision response.

Extracts improvement suggestions from a supervision response.

Formats the quality criteria for inclusion in prompts.

Includes previous feedback context for iterative improvement.

Parses a supervision LLM response to extract structured feedback.

Prioritizes suggestions by estimated impact.

Types

feedback()

@type feedback() :: %{
  issues: [String.t()],
  suggestions: [String.t()],
  quality_score: float(),
  strengths: [String.t()],
  raw_text: String.t()
}

prioritized_suggestion()

@type prioritized_suggestion() :: %{
  content: String.t(),
  impact: :high | :medium | :low,
  category: atom()
}

supervision_context()

@type supervision_context() :: %{
  question: String.t(),
  answer: String.t(),
  step: pos_integer(),
  previous_feedback: feedback() | nil
}

Functions

build_improvement_prompt(question, answer, feedback)

@spec build_improvement_prompt(String.t(), String.t(), feedback()) ::
  {String.t(), String.t()}

Builds the improvement prompt for applying feedback.

Returns a tuple of {system_prompt, user_prompt} that can be used to create an LLM directive for the improvement phase.

Parameters

  • question - The original question
  • answer - The current answer to improve
  • feedback - The feedback from supervision (issues, suggestions, score)

Returns

A tuple {system_prompt, user_prompt} for generating an improved answer.

build_supervision_prompt(context)

@spec build_supervision_prompt(supervision_context()) :: {String.t(), String.t()}

Builds the supervision prompt for critical answer evaluation.

Returns a tuple of {system_prompt, user_prompt} that can be used to create an LLM directive for the supervision phase.

Parameters

  • context - A map containing:
    • :question - The original question being answered
    • :answer - The current answer to evaluate
    • :step - The current supervision step number
    • :previous_feedback - Optional feedback from previous supervision (for iterative improvement)

Returns

A tuple {system_prompt, user_prompt} for LLM evaluation.

Examples

iex> context = %{question: "What is AI?", answer: "AI is...", step: 1, previous_feedback: nil}
iex> {system, user} = Supervision.build_supervision_prompt(context)
iex> is_binary(system) and is_binary(user)
true

calculate_quality_score(response)

@spec calculate_quality_score(String.t()) :: float()

Calculates the quality score from a supervision response.

First tries to extract an explicit SCORE marker. If not found, calculates a heuristic score based on the ratio of strengths to issues.

Parameters

  • response - The raw LLM response text

Returns

A float between 0.0 and 1.0 representing the quality score.

default_improvement_system_prompt()

@spec default_improvement_system_prompt() :: String.t()

Returns the default system prompt for applying feedback to improve answers.

The prompt instructs the LLM to:

  • Address all identified issues
  • Implement the suggested improvements
  • Preserve what was already correct
  • Produce a complete, improved answer

default_supervision_system_prompt()

@spec default_supervision_system_prompt() :: String.t()

Returns the default system prompt for critical answer supervision.

The prompt instructs the LLM to:

  • Evaluate the answer across multiple quality dimensions
  • Identify specific issues and weaknesses
  • Provide actionable suggestions for improvement
  • Assign a quality score from 0.0 to 1.0

extract_issues(response)

@spec extract_issues(String.t()) :: [String.t()]

Extracts issues from a supervision response.

Looks for lines starting with issue markers (ISSUE:, PROBLEM:, etc.) and returns them as a list of strings.

extract_strengths(response)

@spec extract_strengths(String.t()) :: [String.t()]

Extracts strengths from a supervision response.

Looks for lines starting with strength markers (STRENGTH:, CORRECT:, etc.) and returns them as a list of strings.

extract_suggestions(response)

@spec extract_suggestions(String.t()) :: [String.t()]

Extracts improvement suggestions from a supervision response.

Looks for lines starting with suggestion markers (SUGGESTION:, RECOMMEND:, etc.) and returns them as a list of strings.

format_quality_criteria()

@spec format_quality_criteria() :: String.t()

Formats the quality criteria for inclusion in prompts.

Lists the evaluation dimensions with brief descriptions.

include_previous_feedback(base_prompt, previous_feedback)

@spec include_previous_feedback(String.t(), feedback() | nil) :: String.t()

Includes previous feedback context for iterative improvement.

Formats the previous feedback for inclusion in the supervision prompt, allowing the evaluator to see what was already addressed.

parse_supervision_result(response)

@spec parse_supervision_result(String.t()) :: feedback()

Parses a supervision LLM response to extract structured feedback.

Looks for formatted markers in the response:

  • STRENGTH: - Things done well
  • ISSUE: - Problems identified
  • SUGGESTION: - Improvement recommendations
  • SCORE: - Overall quality score (0.0-1.0)

Parameters

  • response - The raw LLM response text

Returns

A feedback map with:

  • :issues - List of issues identified
  • :suggestions - List of improvement suggestions
  • :strengths - List of things done well
  • :quality_score - Overall score (0.0-1.0)
  • :raw_text - The original response

Examples

iex> response = "ISSUE: Missing explanation\nSUGGESTION: Add details\nSCORE: 0.6"
iex> feedback = Supervision.parse_supervision_result(response)
iex> length(feedback.issues)
1

prioritize_suggestions(suggestions)

@spec prioritize_suggestions([String.t()]) :: [prioritized_suggestion()]

Prioritizes suggestions by estimated impact.

Analyzes each suggestion to estimate its impact on answer quality, then returns them sorted from highest to lowest impact.

Parameters

  • suggestions - List of suggestion strings

Returns

A list of prioritized suggestion maps with :content, :impact, and :category.