Lotus.AI.SchemaOptimizer (Lotus v0.16.5)

Copy Markdown View Source

Optimizes schema loading for large databases using two-stage approach.

For databases with 50+ tables, loading the full schema for all tables in a single prompt can exceed token limits and reduce accuracy. This module implements a two-stage strategy:

  1. Stage 1: Identify relevant tables from the user's query
  2. Stage 2: Load detailed schema only for identified tables

This dramatically reduces token usage while improving accuracy.

Usage

# Check if optimization is needed
if SchemaOptimizer.should_optimize?(table_count) do
  # Stage 1: Identify relevant tables
  {:ok, relevant_tables} = SchemaOptimizer.identify_relevant_tables(
    prompt: "Show sales by region",
    tables: all_table_names,
    config: ai_config
  )

  # Stage 2: Load schema only for relevant tables
  # ... proceed with normal query generation using only relevant_tables
end

Benefits

  • Reduced tokens: Only load schemas for relevant tables (5-10 instead of 50+)
  • Improved accuracy: LLM focuses on relevant schema without noise
  • Faster generation: Less context to process means faster responses
  • Cost savings: Fewer tokens = lower API costs

Summary

Functions

Identify relevant tables for a given query (Stage 1).

Determine if schema optimization should be used.

Functions

identify_relevant_tables(opts)

@spec identify_relevant_tables(keyword()) :: {:ok, [String.t()]} | {:error, term()}

Identify relevant tables for a given query (Stage 1).

Uses a lightweight LLM call with just table names to identify which tables are likely needed for the query.

Parameters

  • opts - Keyword list with:
    • :prompt (required) - User's natural language query
    • :tables (required) - List of all available table names
    • :config (required) - AI provider configuration
    • :max_tables (optional) - Maximum tables to identify (default: 10)

Returns

  • {:ok, [table_names]} - List of relevant table names
  • {:error, term} - Error during identification

Examples

{:ok, tables} = SchemaOptimizer.identify_relevant_tables(
  prompt: "Show total sales by region for last month",
  tables: ["users", "orders", "products", "regions", "sales", ...],
  config: %{provider: "openai", api_key: "..."}
)
# => {:ok, ["sales", "regions", "orders"]}

should_optimize?(table_count)

@spec should_optimize?(non_neg_integer()) :: boolean()

Determine if schema optimization should be used.

Returns true if the database has more than 50 tables.

Examples

iex> SchemaOptimizer.should_optimize?(25)
false

iex> SchemaOptimizer.should_optimize?(75)
true