# Neural Models in Nasty Complete guide to using neural network models in Nasty for state-of-the-art NLP performance. ## Overview Nasty integrates neural network models using **Axon**, Elixir's neural network library, providing: - **BiLSTM-CRF architecture** for sequence tagging (POS, NER) - **97-98% accuracy** on standard POS tagging benchmarks - **EXLA JIT compilation** for 10-100x speedup - **Seamless integration** with existing pipeline - **Pre-trained embedding support** (GloVe, FastText) - **Model persistence** and loading - **Graceful fallbacks** to HMM and rule-based models ## Quick Start ### Installation Neural dependencies are already included in `mix.exs`: ```elixir # Already added {:axon, "~> 0.7"}, # Neural networks {:nx, "~> 0.9"}, # Numerical computing {:exla, "~> 0.9"}, # XLA compiler (GPU/CPU acceleration) {:bumblebee, "~> 0.6"}, # Pre-trained models {:tokenizers, "~> 0.5"} # Fast tokenization ``` ### Basic Usage ```elixir # Parse text with neural POS tagger {:ok, ast} = Nasty.parse("The cat sat on the mat.", language: :en, model: :neural ) # Tokens will have POS tags predicted by neural model ``` ### Training Your Own Model ```bash # Download Universal Dependencies corpus # https://universaldependencies.org/ # Train neural POS tagger mix nasty.train.neural_pos \ --corpus data/en_ewt-ud-train.conllu \ --test-corpus data/en_ewt-ud-test.conllu \ --epochs 10 \ --hidden-size 256 # Model saved to priv/models/en/pos_neural_v1.axon ``` ### Using Trained Models ```elixir alias Nasty.Statistics.POSTagging.NeuralTagger # Load model {:ok, model} = NeuralTagger.load("priv/models/en/pos_neural_v1.axon") # Predict words = ["The", "cat", "sat"] {:ok, tags} = NeuralTagger.predict(model, words, []) # => {:ok, [:det, :noun, :verb]} ``` ## Architecture ### BiLSTM-CRF The default architecture is **Bidirectional LSTM with CRF** (Conditional Random Field): ```mermaid flowchart TD A[Input Words] B["Word Embeddings (300d)"] C["BiLSTM Layer 1 (256 hidden units)"] D["Dropout (0.3)"] E["BiLSTM Layer 2 (256 hidden units)"] F["Dense Projection → POS Tags"] G[Softmax/CRF] H[Output Tags] A --> B B --> C C --> D D --> E E --> F F --> G G --> H ``` **Key Features:** - Bidirectional context (forward + backward) - Optional character-level CNN for OOV handling - Dropout regularization - 2-3 LSTM layers (configurable) - 256-512 hidden units (configurable) ### Performance **Accuracy:** - POS Tagging: 97-98% (vs 95% HMM, 85% rule-based) - NER: 88-92% F1 (future) - Dependency Parsing: 94-96% UAS (future) **Speed (on UD English, 12k sentences):** - CPU: ~30-60 minutes training - GPU (EXLA): ~5-10 minutes training - Inference: ~1000-5000 tokens/second (CPU) - Inference: ~10000+ tokens/second (GPU) ## Model Integration Modes Nasty provides multiple integration modes: ### 1. Neural Only (`:neural`) Uses only the neural model: ```elixir {:ok, ast} = Nasty.parse(text, language: :en, model: :neural) ``` **Fallback:** If neural model unavailable, falls back to HMM → rule-based. ### 2. Neural Ensemble (`:neural_ensemble`) Combines neural + HMM + rule-based: ```elixir {:ok, ast} = Nasty.parse(text, language: :en, model: :neural_ensemble) ``` **Strategy:** - Use rule-based for punctuation and numbers (high confidence) - Use neural predictions for content words - Best accuracy overall ### 3. Traditional Modes Still available: - `:rule_based` - Fast, 85% accuracy - `:hmm` - 95% accuracy - `:ensemble` - HMM + rules ## Training Guide ### 1. Prepare Data Download Universal Dependencies corpus: ```bash # English wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu # Or other languages # Spanish, Catalan, etc. ``` ### 2. Train Model ```bash mix nasty.train.neural_pos \ --corpus en_ewt-ud-train.conllu \ --test-corpus en_ewt-ud-test.conllu \ --output priv/models/en/pos_neural_v1.axon \ --epochs 10 \ --batch-size 32 \ --learning-rate 0.001 \ --hidden-size 256 \ --num-layers 2 \ --dropout 0.3 \ --use-char-cnn false ``` ### 3. Evaluate The training task automatically evaluates on test set and reports: - Overall accuracy - Per-tag precision, recall, F1 - Confusion matrix (if requested) ### 4. Deploy Models are automatically saved with: - Model weights (`.axon` file) - Metadata (`.meta.json` file) - Vocabulary and tag mappings Load via `ModelLoader.load_latest(:en, :pos_tagging_neural)` or directly with `NeuralTagger.load/1`. ## Programmatic Training ```elixir alias Nasty.Statistics.POSTagging.NeuralTagger alias Nasty.Statistics.Neural.DataLoader # Load corpus {:ok, sentences} = DataLoader.load_conllu("train.conllu") # Split data {train, valid} = DataLoader.split(sentences, [0.9, 0.1]) # Build vocabularies {:ok, vocab, tag_vocab} = DataLoader.build_vocabularies(train, min_freq: 2) # Create model tagger = NeuralTagger.new( vocab: vocab, tag_vocab: tag_vocab, embedding_dim: 300, hidden_size: 256, num_layers: 2, dropout: 0.3 ) # Train {:ok, trained} = NeuralTagger.train(tagger, train, epochs: 10, batch_size: 32, learning_rate: 0.001, validation_split: 0.1 ) # Save NeuralTagger.save(trained, "my_model.axon") ``` ## Pre-trained Embeddings ### Using GloVe ```elixir alias Nasty.Statistics.Neural.Embeddings # Load GloVe embeddings {:ok, embeddings} = Embeddings.load_glove("glove.6B.300d.txt", vocab) # Use during training tagger = NeuralTagger.new( vocab: vocab, tag_vocab: tag_vocab, pretrained_embeddings: embeddings ) ``` Download GloVe: ```bash wget http://nlp.stanford.edu/data/glove.6B.zip unzip glove.6B.zip ``` ## Advanced Features ### Character-Level CNN For better OOV handling: ```bash mix nasty.train.neural_pos \ --corpus train.conllu \ --use-char-cnn \ --char-filters 3,4,5 \ --char-num-filters 30 ``` ### Custom Architectures Extend `Nasty.Statistics.Neural.Architectures.BiLSTMCRF`: ```elixir defmodule MyArchitecture do def build(opts) do # Custom Axon model Axon.input("tokens") |> Axon.embedding(opts[:vocab_size], opts[:embedding_dim]) |> # ... your architecture end end ``` ### Streaming Training For large datasets: ```elixir DataLoader.stream_batches("huge_corpus.conllu", vocab, tag_vocab, batch_size: 64) |> Stream.take(1000) # Process in chunks |> Enum.each(&train_batch/1) ``` ## Troubleshooting ### EXLA Compilation Issues If EXLA fails to compile: ```bash # Install XLA dependencies # Ubuntu/Debian: sudo apt-get install build-essential # Set compiler flags export ELIXIR_ERL_OPTIONS="+fnu" mix deps.clean exla --build mix deps.get ``` ### Out of Memory Reduce batch size: ```bash mix nasty.train.neural_pos --batch-size 16 # Instead of 32 ``` Or use gradient accumulation: ```elixir # In training opts accumulation_steps: 4 ``` ### Slow Training Enable EXLA: ```elixir # Should be automatic, but verify: compiler: EXLA ``` Use GPU if available: ```bash export XLA_TARGET=cuda ``` ## Future Enhancements - **Transformers**: BERT, RoBERTa via Bumblebee - **NER models**: BiLSTM-CRF for named entity recognition - **Dependency parsing**: Biaffine attention parser - **Multilingual**: mBERT, XLM-R support - **Model quantization**: INT8 for faster inference - **Knowledge distillation**: Compress large models ## See Also - [TRAINING_NEURAL.md](TRAINING_NEURAL.md) - Detailed training guide - [PRETRAINED_MODELS.md](PRETRAINED_MODELS.md) - Using transformers - [API.md](API.md) - Full API documentation - [BiLSTM-CRF paper](https://arxiv.org/abs/1508.01991) - [Axon documentation](https://hexdocs.pm/axon)