# Nasty Examples Catalog

Comprehensive catalog of all example scripts demonstrating Nasty's capabilities.

## Quick Start

All examples can be run directly:
```bash
elixir examples/example_name.exs
```

Or make them executable:
```bash
chmod +x examples/example_name.exs
./examples/example_name.exs
```

## Basic Examples

### tokenizer_example.exs

**Purpose**: Introduction to tokenization

**What it demonstrates**:
- Basic tokenization with NimbleParsec
- Position tracking (line, column, byte offsets)
- Handling contractions (don't, it's)
- Punctuation as separate tokens
- Sentence boundary detection

**Run**:
```bash
elixir examples/tokenizer_example.exs
```

**Best for**: Understanding the first step in the NLP pipeline

---

### hmm_pos_tagger_example.exs

**Purpose**: Statistical POS tagging with Hidden Markov Models

**What it demonstrates**:
- Training HMM POS taggers from CoNLL-U data
- Viterbi algorithm for sequence tagging
- Model evaluation and accuracy metrics
- Comparison with rule-based tagging
- Model persistence (save/load)

**Run**:
```bash
elixir examples/hmm_pos_tagger_example.exs
```

**Best for**: Learning about statistical NLP models

---

### neural_pos_tagger_example.exs

**Purpose**: Neural POS tagging with BiLSTM-CRF

**What it demonstrates**:
- BiLSTM-CRF architecture with Axon/EXLA
- Training neural models on UD corpora
- Character-level embeddings for OOV handling
- GPU acceleration with EXLA
- 97-98% accuracy on benchmark datasets

**Run**:
```bash
elixir examples/neural_pos_tagger_example.exs
```

**Best for**: Understanding deep learning for NLP

---

## Language-Specific Examples

### spanish_example.exs

**Purpose**: Spanish language processing

**What it demonstrates**:
- Spanish tokenization (¿?, ¡!, del, al contractions)
- Spanish POS tagging with morphology
- Gender/number agreement
- Parsing Spanish sentence structure
- Entity recognition with Spanish lexicons

**Run**:
```bash
elixir examples/spanish_example.exs
```

**Best for**: Working with Romance languages

---

### catalan_example.exs

**Purpose**: Catalan language processing  

**What it demonstrates**:
- Catalan-specific tokenization (interpunct l·l, apostrophes)
- All 10 Catalan diacritics (à, è, é, í, ï, ò, ó, ú, ü, ç)
- Article contractions (del, al, pel, cal)
- Catalan morphology and POS tagging
- Entity recognition with Catalan lexicons
- Translation between Catalan and English

**Run**:
```bash
elixir examples/catalan_example.exs
```

**Best for**: Catalan NLP applications

---

## Translation Examples

### translation_example.exs

**Purpose**: Basic AST-based translation

**What it demonstrates**:
- English ↔ Spanish translation
- AST-level translation preserving grammar
- Morphological agreement enforcement
- Word order transformations
- Rendering translated AST to text

**Run**:
```bash
elixir examples/translation_example.exs
```

**Best for**: Getting started with translation

---

### roundtrip_translation.exs

**Purpose**: Translation quality analysis

**What it demonstrates**:
- English → Spanish → English roundtrips
- English → Catalan → English roundtrips
- Spanish → English → Spanish roundtrips
- Similarity metrics and quality assessment
- Challenging translation cases
- Performance across complexity levels

**Run**:
```bash
elixir examples/roundtrip_translation.exs
```

**Best for**: Evaluating translation quality

---

### multilingual_pipeline.exs

**Purpose**: Side-by-side multilingual comparison

**What it demonstrates**:
- Processing same content in English, Spanish, Catalan
- Token-level comparison across languages
- POS tagging differences
- Morphological feature comparison
- Translation matrix (all language pairs)
- Performance benchmarking
- Language-specific features summary

**Run**:
```bash
elixir examples/multilingual_pipeline.exs
```

**Best for**: Understanding cross-language differences

---

## Advanced NLP Tasks

### summarization.exs

**Purpose**: Extractive text summarization

**What it demonstrates**:
- Position-weighted sentence scoring
- Entity density calculation
- Discourse marker detection
- Keyword frequency (TF)
- MMR (Maximal Marginal Relevance) for diversity
- Compression ratio vs. fixed sentence count

**Run**:
```bash
elixir examples/summarization.exs
```

**Best for**: Document summarization applications

---

### question_answering.exs

**Purpose**: Extractive question answering

**What it demonstrates**:
- Question classification (WHO, WHAT, WHEN, WHERE, WHY, HOW)
- Answer extraction strategies
- Entity type filtering
- Keyword matching with lemmatization
- Confidence scoring
- Multiple answer support

**Run**:
```bash
elixir examples/question_answering.exs
```

**Best for**: Building Q&A systems

---

### text_classification.exs

**Purpose**: Document classification

**What it demonstrates**:
- Multinomial Naive Bayes classifier
- Feature extraction (BOW, n-grams, POS patterns, entities, lexical)
- Training on labeled data
- Multi-class classification
- Model evaluation (accuracy, precision, recall, F1)
- Sentiment analysis example

**Run**:
```bash
elixir examples/text_classification.exs
```

**Best for**: Text categorization tasks

---

### information_extraction.exs

**Purpose**: Structured information extraction

**What it demonstrates**:
- Relation extraction (employment, organization, location)
- Event extraction (acquisitions, foundings, announcements)
- Template-based extraction
- Pattern matching with verb patterns
- Confidence scoring
- Integration with NER and dependencies

**Run**:
```bash
elixir examples/information_extraction.exs
```

**Best for**: Knowledge base construction

---

## Code Interoperability

### code_generation.exs

**Purpose**: Natural language to code

**What it demonstrates**:
- Intent recognition from natural language
- Constraint extraction (comparison, property, range)
- Elixir code generation
- List operations (sort, filter, map, reduce)
- Arithmetic expressions
- Conditional statements

**Run**:
```bash
elixir examples/code_generation.exs
```

**Best for**: Natural language programming interfaces

---

### code_explanation.exs

**Purpose**: Code to natural language

**What it demonstrates**:
- Elixir AST parsing
- Code explanation generation
- Pipeline explanation
- Function call description
- Variable usage analysis

**Run**:
```bash
elixir examples/code_explanation.exs
```

**Best for**: Code documentation and understanding

---

## Neural Network Examples

### pretrained_model_usage.exs

**Purpose**: Using pre-trained transformers

**What it demonstrates**:
- BERT and RoBERTa via Bumblebee
- Fine-tuning for POS tagging and NER
- Zero-shot classification
- Model quantization (INT8)
- Multilingual models (XLM-RoBERTa)

**Run**:
```bash
elixir examples/pretrained_model_usage.exs
```

**Best for**: Leveraging pre-trained models

---

### transformer_pos_example.exs

**Purpose**: Transformer-based POS tagging

**What it demonstrates**:
- RoBERTa for POS tagging
- Fine-tuning transformers
- 98-99% accuracy
- Cross-lingual transfer
- Model comparison

**Run**:
```bash
elixir examples/transformer_pos_example.exs
```

**Best for**: State-of-the-art accuracy

---

### advanced_neural_features.exs

**Purpose**: Advanced neural NLP features

**What it demonstrates**:
- Multiple neural architectures
- Ensemble methods
- Model quantization
- Zero-shot learning
- Cross-lingual transfer
- Performance optimization

**Run**:
```bash
elixir examples/advanced_neural_features.exs
```

**Best for**: Production neural NLP systems

---

## Comprehensive Demos

### comprehensive_demo.exs

**Purpose**: Complete NLP pipeline walkthrough

**What it demonstrates**:
- Full pipeline from tokenization to summarization
- All major NLP tasks
- Entity recognition
- Dependency extraction
- Semantic role labeling
- Coreference resolution
- Information extraction

**Run**:
```bash
./examples/comprehensive_demo.exs
```

**Best for**: Overview of all capabilities

---

## Example Selection Guide

### By Use Case

**Text Analysis**:
- tokenizer_example.exs
- hmm_pos_tagger_example.exs
- comprehensive_demo.exs

**Machine Learning**:
- neural_pos_tagger_example.exs
- transformer_pos_example.exs
- text_classification.exs
- advanced_neural_features.exs

**Multilingual**:
- spanish_example.exs
- catalan_example.exs
- translation_example.exs
- roundtrip_translation.exs
- multilingual_pipeline.exs

**Information Extraction**:
- question_answering.exs
- information_extraction.exs
- summarization.exs

**Code Integration**:
- code_generation.exs
- code_explanation.exs

### By Difficulty Level

**Beginner**:
1. tokenizer_example.exs
2. spanish_example.exs
3. translation_example.exs
4. summarization.exs

**Intermediate**:
1. hmm_pos_tagger_example.exs
2. catalan_example.exs
3. question_answering.exs
4. text_classification.exs
5. multilingual_pipeline.exs

**Advanced**:
1. neural_pos_tagger_example.exs
2. information_extraction.exs
3. transformer_pos_example.exs
4. advanced_neural_features.exs
5. roundtrip_translation.exs

### By Processing Time

**Fast (<1 second)**:
- tokenizer_example.exs
- translation_example.exs
- spanish_example.exs

**Medium (1-10 seconds)**:
- catalan_example.exs
- multilingual_pipeline.exs
- summarization.exs
- question_answering.exs

**Slow (>10 seconds)**:
- hmm_pos_tagger_example.exs (if training)
- neural_pos_tagger_example.exs
- transformer_pos_example.exs
- roundtrip_translation.exs

## Running Multiple Examples

### Run all basic examples:
```bash
for example in tokenizer_example spanish_example translation_example; do
  echo "Running ${example}..."
  elixir examples/${example}.exs
  echo "---"
done
```

### Run all translation examples:
```bash
for example in translation_example roundtrip_translation multilingual_pipeline; do
  elixir examples/${example}.exs
done
```

### Run all language-specific examples:
```bash
elixir examples/spanish_example.exs
elixir examples/catalan_example.exs
elixir examples/multilingual_pipeline.exs
```

## Expected Output

### Typical Output Format

Most examples output:
1. **Section headers**: Clearly marked sections
2. **Input text**: What's being processed
3. **Results**: Parsed output, tags, entities, etc.
4. **Statistics**: Counts, accuracy, timing
5. **Summary**: Key takeaways

### Example Output Snippet

```
========================================
Spanish Language Processing Demo
========================================

1. Tokenization
---------------
Input: El gato duerme en el sofá.

Tokens:
  El (1:1)
  gato (1:4)
  duerme (1:9)
  ...

2. POS Tagging
--------------
Tagged tokens:
  El → det
  gato → noun
  duerme → verb
  ...
```

## Troubleshooting

### Common Issues

**Example won't run**:
```bash
# Make sure dependencies are installed
mix deps.get
mix compile

# Check file permissions
chmod +x examples/example_name.exs
```

**Missing models**:
Some examples (neural, transformer) require trained models. See [TRAINING_NEURAL.md](TRAINING_NEURAL.md) for training instructions.

**Out of memory**:
Neural/transformer examples may need more memory. Reduce batch size or use smaller models.

## Creating Your Own Examples

Template for new examples:

```elixir
#!/usr/bin/env elixir

# Your Example Name
#
# Brief description of what this example demonstrates

Mix.install([
  {:nasty, path: Path.expand("..", __DIR__)}
])

alias Nasty.Language.English

IO.puts("\n========================================")
IO.puts("Your Example Title")
IO.puts("========================================\n")

# Example 1: First concept
IO.puts("1. First Section")
IO.puts("----------------")

# Your code here

# Example 2: Second concept
IO.puts("\n2. Second Section")
IO.puts("-----------------")

# Your code here

IO.puts("\n========================================")
IO.puts("Example Complete!")
IO.puts("========================================\n")
```

## See Also

- [GETTING_STARTED.md](GETTING_STARTED.md) - Tutorial for beginners
- [USER_GUIDE.md](USER_GUIDE.md) - Comprehensive usage guide
- [API.md](API.md) - API reference
- [TRANSLATION.md](TRANSLATION.md) - Translation system guide