# Retrievers

Retrievers provide different strategies for finding relevant documents based on queries.

## Overview

The library provides four retriever implementations:

| Retriever | Method | Query Type | Best For |
|-----------|--------|------------|----------|
| **Semantic** | Vector similarity | Embedding | Conceptual matching |
| **FullText** | Keyword matching | Text | Exact keywords |
| **Hybrid** | RRF fusion | Both | Balanced results |
| **Graph** | Knowledge graph | Both | Entity relationships |

## Retriever Behaviour

All retrievers implement the `Rag.Retriever` behaviour:

```elixir
@callback retrieve(retriever, query, opts) :: {:ok, [result()]} | {:error, term()}

@type result :: %{
  id: any(),
  content: String.t(),
  score: float(),
  metadata: map()
}
```

## Semantic Retriever

Uses pgvector L2 distance for vector similarity search.

```elixir
alias Rag.Retriever
alias Rag.Retriever.Semantic

# Create retriever
retriever = %Semantic{repo: MyApp.Repo}

# Search with embedding
{:ok, results} = Retriever.retrieve(retriever, query_embedding, limit: 10)
```

**Scoring:**
- Score = 1.0 - L2_distance
- Range: 0.0 (dissimilar) to 1.0 (identical)

**Capabilities:**
- `supports_embedding?()` - true
- `supports_text_query?()` - false

**Best for:**
- Finding conceptually similar content
- When query meaning matters more than exact words

## FullText Retriever

Uses PostgreSQL tsvector for keyword matching.

```elixir
alias Rag.Retriever
alias Rag.Retriever.FullText

# Create retriever
retriever = %FullText{repo: MyApp.Repo}

# Search with text
{:ok, results} = Retriever.retrieve(retriever, "GenServer state", limit: 10)
```

**Scoring:**
- Score from PostgreSQL ts_rank function
- Multiple terms combined with AND

**Capabilities:**
- `supports_embedding?()` - false
- `supports_text_query?()` - true

**Best for:**
- Finding documents with specific keywords
- Technical term searches
- When exact matches matter

## Hybrid Retriever

Combines semantic and full-text using Reciprocal Rank Fusion (RRF).

```elixir
alias Rag.Retriever
alias Rag.Retriever.Hybrid

# Create retriever
retriever = %Hybrid{repo: MyApp.Repo}

# Search with both embedding and text
{:ok, results} = Retriever.retrieve(retriever, {embedding, "search text"}, limit: 10)
```

**Query Format:**
- Tuple of `{embedding_vector, text_query}`

**Scoring (RRF):**
```
RRF(d) = Σ 1 / (k + rank(d))  where k = 60
```
- Documents in both result sets get combined scores
- Balances semantic understanding with keyword precision

**Capabilities:**
- `supports_embedding?()` - true
- `supports_text_query?()` - true

**Best for:**
- Balanced semantic + keyword search
- When you want best of both methods
- Production RAG systems

## Graph Retriever

Uses knowledge graph structure for context-aware retrieval.

```elixir
alias Rag.Retriever.Graph

# Create retriever
retriever = Graph.new(
  graph_store: graph_store,
  vector_store: vector_store,
  mode: :hybrid,
  depth: 2,
  local_weight: 0.7,
  global_weight: 0.3
)

# Search
{:ok, results} = Retriever.retrieve(retriever, query_embedding,
  limit: 10,
  embedding_fn: &embed/1
)
```

### Search Modes

**Local Search (`:local`)**
- Vector search on entity embeddings
- Graph traversal to related entities
- Collect source chunks from entities
- Score by graph distance

```elixir
{:ok, results} = Graph.local_search(retriever, query, limit: 10)
```

**Global Search (`:global`)**
- Vector search on community summaries
- Returns high-level context
- Good for overview questions

```elixir
{:ok, results} = Graph.global_search(retriever, query, limit: 10)
```

**Hybrid Search (`:hybrid`)**
- Runs local and global in parallel
- Combines with weighted RRF
- Best of both approaches

```elixir
{:ok, results} = Graph.hybrid_search(retriever, query, limit: 10)
```

### Configuration

| Option | Default | Description |
|--------|---------|-------------|
| `graph_store` | required | Graph store module |
| `vector_store` | required | Vector store module |
| `mode` | `:local` | Search mode |
| `depth` | 2 | Graph traversal depth |
| `local_weight` | 1.0 | Weight for local in hybrid |
| `global_weight` | 1.0 | Weight for global in hybrid |

## Scoring Comparison

| Retriever | Score Source | Range | Formula |
|-----------|--------------|-------|---------|
| Semantic | L2 distance | 0-1 | `1.0 - distance` |
| FullText | ts_rank | 0-1+ | PostgreSQL rank |
| Hybrid | RRF | varies | `Σ 1/(60+rank)` |
| Graph Local | Depth | 0-1 | `1/(1+depth)` |
| Graph Global | Rank | 0-1 | `1/(1+rank)` |

## Complete Example

```elixir
alias Rag.Router
alias Rag.Retriever
alias Rag.Retriever.{Semantic, FullText, Hybrid}
alias Rag.Reranker
alias Rag.Reranker.LLM

# Setup
{:ok, router} = Router.new(providers: [:gemini])
query = "How does GenServer handle state?"

# Get query embedding
{:ok, [query_embedding], router} = Router.execute(router, :embeddings, [query], [])

# Semantic search
semantic_retriever = %Semantic{repo: Repo}
{:ok, semantic_results} = Retriever.retrieve(semantic_retriever, query_embedding, limit: 10)

# Full-text search
fulltext_retriever = %FullText{repo: Repo}
{:ok, fulltext_results} = Retriever.retrieve(fulltext_retriever, query, limit: 10)

# Hybrid search
hybrid_retriever = %Hybrid{repo: Repo}
{:ok, hybrid_results} = Retriever.retrieve(hybrid_retriever, {query_embedding, query}, limit: 10)

# Compare results
IO.puts("Semantic: #{length(semantic_results)} results")
IO.puts("FullText: #{length(fulltext_results)} results")
IO.puts("Hybrid: #{length(hybrid_results)} results")

# Rerank hybrid results
reranker = LLM.new(router: router)
{:ok, reranked} = Reranker.rerank(reranker, query, hybrid_results, top_k: 5)

# Use top results for RAG
context = Enum.map(reranked, & &1.content) |> Enum.join("\n\n")
```

## Choosing a Retriever

**Use Semantic when:**
- Query meaning matters more than keywords
- Finding conceptually similar content
- Working with paraphrased queries

**Use FullText when:**
- Searching for specific terms
- Technical/domain-specific queries
- Exact keyword matching needed

**Use Hybrid when:**
- Want balanced results
- Building production RAG systems
- Unsure which method works best

**Use Graph when:**
- Entity relationships matter
- Need multi-hop reasoning
- Building knowledge-intensive applications

## Pipeline Integration

```elixir
# In a pipeline step
def retrieve_step(input, context, _opts) do
  retriever = %Hybrid{repo: context.repo}
  embedding = Context.get_step_result(context, :embed_query)
  query = context.query

  case Retriever.retrieve(retriever, {embedding, query}, limit: 10) do
    {:ok, results} -> {:ok, results}
    {:error, reason} -> {:error, reason}
  end
end
```

## Next Steps

- [Rerankers](rerankers.md) - Improve retrieval quality with reranking
- [GraphRAG](graph_rag.md) - Build knowledge graphs for retrieval
- [Pipeline](pipelines.md) - Combine retrievers in workflows