Optimizes schema loading for large databases using two-stage approach.
For databases with 50+ tables, loading the full schema for all tables in a single prompt can exceed token limits and reduce accuracy. This module implements a two-stage strategy:
- Stage 1: Identify relevant tables from the user's query
- Stage 2: Load detailed schema only for identified tables
This dramatically reduces token usage while improving accuracy.
Usage
# Check if optimization is needed
if SchemaOptimizer.should_optimize?(table_count) do
# Stage 1: Identify relevant tables
{:ok, relevant_tables} = SchemaOptimizer.identify_relevant_tables(
prompt: "Show sales by region",
tables: all_table_names,
config: ai_config
)
# Stage 2: Load schema only for relevant tables
# ... proceed with normal query generation using only relevant_tables
endBenefits
- Reduced tokens: Only load schemas for relevant tables (5-10 instead of 50+)
- Improved accuracy: LLM focuses on relevant schema without noise
- Faster generation: Less context to process means faster responses
- Cost savings: Fewer tokens = lower API costs
Summary
Functions
Identify relevant tables for a given query (Stage 1).
Determine if schema optimization should be used.
Functions
Identify relevant tables for a given query (Stage 1).
Uses a lightweight LLM call with just table names to identify which tables are likely needed for the query.
Parameters
opts- Keyword list with::prompt(required) - User's natural language query:tables(required) - List of all available table names:config(required) - AI provider configuration:max_tables(optional) - Maximum tables to identify (default: 10)
Returns
{:ok, [table_names]}- List of relevant table names{:error, term}- Error during identification
Examples
{:ok, tables} = SchemaOptimizer.identify_relevant_tables(
prompt: "Show total sales by region for last month",
tables: ["users", "orders", "products", "regions", "sales", ...],
config: %{provider: "openai", api_key: "..."}
)
# => {:ok, ["sales", "regions", "orders"]}
@spec should_optimize?(non_neg_integer()) :: boolean()
Determine if schema optimization should be used.
Returns true if the database has more than 50 tables.
Examples
iex> SchemaOptimizer.should_optimize?(25)
false
iex> SchemaOptimizer.should_optimize?(75)
true