# Nasty Public API Reference This document describes the public API of Nasty, the Natural Abstract Syntax Tree library for Elixir. ## Core Functions ### Parsing #### `Nasty.parse/2` Parses natural language text into an Abstract Syntax Tree (AST). **Parameters:** - `text` (String.t()) - The text to parse - `opts` (keyword()) - Options: - `:language` - Language code (`:en`, `:es`, `:ca`, etc.) **Required** - `:tokenize` - Enable tokenization (default: `true`) - `:pos_tag` - Enable POS tagging (default: `true`) - `:parse_dependencies` - Parse dependency relationships (default: `true`) - `:extract_entities` - Extract named entities (default: `false`) - `:resolve_coreferences` - Resolve coreferences (default: `false`) **Returns:** - `{:ok, %Nasty.AST.Document{}}` - Parsed AST document - `{:error, reason}` - Parse error **Examples:** ```elixir # Basic parsing {:ok, ast} = Nasty.parse("The cat sat on the mat.", language: :en) # With entity recognition {:ok, ast} = Nasty.parse("John lives in Paris.", language: :en, extract_entities: true ) # With coreference resolution {:ok, ast} = Nasty.parse("Mary loves her cat. She feeds it daily.", language: :en, resolve_coreferences: true ) ``` #### `Nasty.render/2` Renders an AST back to natural language text. **Parameters:** - `ast` (struct()) - AST node to render (Document, Sentence, etc.) - `opts` (keyword()) - Options (language determined from AST) **Returns:** - `{:ok, text}` - Rendered text string - `{:error, reason}` - Render error **Examples:** ```elixir {:ok, ast} = Nasty.parse("The cat sat.", language: :en) {:ok, text} = Nasty.render(ast) # => "The cat sat." ``` ### Translation #### `Nasty.Translation.Translator.translate_document/2` Translates an AST document from one language to another. **Parameters:** - `document` - AST Document to translate - `target_language` - Target language code (`:en`, `:es`, `:ca`, etc.) **Returns:** - `{:ok, %Nasty.AST.Document{}}` - Translated AST document - `{:error, reason}` - Translation error **Examples:** ```elixir alias Nasty.Translation.Translator # Translate English to Spanish {:ok, doc_en} = Nasty.parse("The cat runs.", language: :en) {:ok, doc_es} = Translator.translate_document(doc_en, :es) {:ok, text_es} = Nasty.render(doc_es) # => "El gato corre." # Translate Spanish to English {:ok, doc_es} = Nasty.parse("La casa grande.", language: :es) {:ok, doc_en} = Translator.translate_document(doc_es, :en) {:ok, text_en} = Nasty.render(doc_en) # => "The big house." # Or translate text directly {:ok, text_es} = Translator.translate("The cat runs.", :en, :es) # => "El gato corre." ``` ### Summarization #### `Nasty.summarize/2` Summarizes a document by extracting important sentences. **Parameters:** - `text_or_ast` - Text string or AST Document to summarize - `opts` (keyword()) - Options: - `:language` - Language code (required if text) - `:ratio` - Compression ratio (0.0 to 1.0), default `0.3` - `:max_sentences` - Maximum number of sentences in summary - `:method` - Selection method: `:greedy` or `:mmr` (default: `:greedy`) - `:min_sentence_length` - Minimum sentence length in tokens (default: `3`) - `:mmr_lambda` - MMR diversity parameter, 0-1 (default: `0.5`) **Returns:** - `{:ok, [%Sentence{}]}` - List of extracted sentences - `{:error, reason}` - Error **Examples:** ```elixir # From text {:ok, summary} = Nasty.summarize(long_text, language: :en, ratio: 0.3 ) # From AST {:ok, ast} = Nasty.parse(long_text, language: :en) {:ok, summary} = Nasty.summarize(ast, max_sentences: 3) # Using MMR for diversity {:ok, summary} = Nasty.summarize(text, language: :en, method: :mmr, mmr_lambda: 0.7 ) ``` ### Code Interoperability #### `Nasty.to_code/2` Converts natural language text to code. **Parameters:** - `text` (String.t()) - Natural language description - `opts` (keyword()) - Options: - `:source_language` - Source natural language (`:en`, etc.) **Required** - `:target_language` - Target programming language (`:elixir`, etc.) **Required** **Returns:** - `{:ok, code_string}` - Generated code - `{:error, reason}` - Error **Supported Language Pairs:** - English → Elixir (`:en` → `:elixir`) **Examples:** ```elixir # List operations {:ok, code} = Nasty.to_code("Sort the list", source_language: :en, target_language: :elixir ) # => "Enum.sort(list)" # Filter with constraints {:ok, code} = Nasty.to_code("Filter users where age is greater than 18", source_language: :en, target_language: :elixir ) # => "Enum.filter(users, fn item -> item > 18 end)" # Arithmetic {:ok, code} = Nasty.to_code("Add x and y", source_language: :en, target_language: :elixir ) # => "x + y" ``` #### `Nasty.explain_code/2` Generates natural language explanation from code. **Parameters:** - `code` - Code string or AST to explain - `opts` (keyword()) - Options: - `:source_language` - Programming language (`:elixir`, etc.) **Required** - `:target_language` - Target natural language (`:en`, etc.) **Required** - `:style` - Explanation style: `:concise` or `:verbose` (default: `:concise`) **Returns:** - `{:ok, explanation_string}` - Natural language explanation - `{:error, reason}` - Error **Supported Language Pairs:** - Elixir → English (`:elixir` → `:en`) **Examples:** ```elixir {:ok, explanation} = Nasty.explain_code("Enum.sort(list)", source_language: :elixir, target_language: :en ) # => "Sort list" {:ok, explanation} = Nasty.explain_code( "list |> Enum.map(&(&1 * 2)) |> Enum.sum()", source_language: :elixir, target_language: :en ) # => "Map list to double each element, then sum the results" # Verbose style {:ok, explanation} = Nasty.explain_code("x = 5", source_language: :elixir, target_language: :en, style: :verbose ) ``` ## Language Registry ### `Nasty.Language.Registry` Manages language implementations. #### `Nasty.Language.Registry.register/1` Registers a language implementation module. ```elixir Nasty.Language.Registry.register(Nasty.Language.English) # => :ok ``` #### `Nasty.Language.Registry.get/1` Gets the implementation module for a language code. ```elixir {:ok, module} = Nasty.Language.Registry.get(:en) # => {:ok, Nasty.Language.English} ``` #### `Nasty.Language.Registry.detect_language/1` Detects the language of the given text. ```elixir {:ok, language} = Nasty.Language.Registry.detect_language("Hello world") # => {:ok, :en} {:ok, language} = Nasty.Language.Registry.detect_language("Hola mundo") # => {:ok, :es} ``` #### `Nasty.Language.Registry.registered_languages/0` Returns all registered language codes. ```elixir Nasty.Language.Registry.registered_languages() # => [:en, :es, :ca] ``` #### `Nasty.Language.Registry.registered?/1` Checks if a language is registered. ```elixir Nasty.Language.Registry.registered?(:en) # => true ``` ## AST Utilities ### Query #### `Nasty.Utils.Query` Query and traverse AST structures. ```elixir alias Nasty.Utils.Query # Find subject in a sentence subject = Query.find_subject(sentence) # Find all noun phrases noun_phrases = Query.find_all(document, :noun_phrase) # Find by POS tag nouns = Query.find_by_pos(document, :noun) verbs = Query.find_by_pos(document, :verb) # Count nodes token_count = Query.count(document, :token) ``` ### Validation #### `Nasty.Utils.Validator` Validate AST structure. ```elixir alias Nasty.Utils.Validator case Validator.validate(document) do {:ok, _doc} -> IO.puts("Valid AST") {:error, reason} -> IO.puts("Invalid: #{reason}") end # Check if valid (boolean) if Validator.valid?(document) do IO.puts("Document is valid") end ``` ### Transformation #### `Nasty.Utils.Transform` Transform AST nodes. ```elixir alias Nasty.Utils.Transform # Case normalization lowercased = Transform.normalize_case(document, :lower) # Remove punctuation no_punct = Transform.remove_punctuation(document) # Remove stop words no_stops = Transform.remove_stop_words(document) # Lemmatize all tokens lemmatized = Transform.lemmatize(document) ``` ### Traversal #### `Nasty.Utils.Traversal` Traverse AST structure. ```elixir alias Nasty.Utils.Traversal # Reduce over all nodes token_count = Traversal.reduce(document, 0, fn %Nasty.AST.Token{}, acc -> acc + 1 _, acc -> acc end) # Collect matching nodes verbs = Traversal.collect(document, fn %Nasty.AST.Token{pos_tag: :verb} -> true _ -> false end) # Map over all nodes transformed = Traversal.map(document, fn %Nasty.AST.Token{} = token -> %{token | text: String.downcase(token.text)} node -> node end) ``` ## Rendering ### Pretty Print #### `Nasty.Rendering.PrettyPrint` Format AST for human-readable inspection. ```elixir # Pretty print to stdout Nasty.Rendering.PrettyPrint.inspect(ast) # Get formatted string formatted = Nasty.Rendering.PrettyPrint.format(ast) ``` ### Visualization #### `Nasty.Rendering.Visualization` Generate visualizations of AST structures. ```elixir # Generate DOT format for Graphviz {:ok, dot} = Nasty.Rendering.Visualization.to_dot(ast) File.write("ast.dot", dot) # Generate JSON representation {:ok, json} = Nasty.Rendering.Visualization.to_json(ast) ``` ### Text Rendering #### `Nasty.Rendering.Text` Render AST to text. ```elixir {:ok, text} = Nasty.Rendering.Text.render(document) ``` ## Statistical & Neural Models ### Model Registry #### `Nasty.Statistics.ModelRegistry` Manage statistical and neural models. ```elixir # Register a model Nasty.Statistics.ModelRegistry.register(:hmm_pos_tagger, model) Nasty.Statistics.ModelRegistry.register(:neural_pos_tagger, neural_model) # Get a model {:ok, model} = Nasty.Statistics.ModelRegistry.get(:hmm_pos_tagger) {:ok, neural} = Nasty.Statistics.ModelRegistry.get(:neural_pos_tagger) # List models models = Nasty.Statistics.ModelRegistry.list_models() ``` ### Model Loader #### `Nasty.Statistics.ModelLoader` Load and save statistical and neural models. ```elixir # Load HMM model from file {:ok, model} = Nasty.Statistics.ModelLoader.load("path/to/model.model") # Load neural model from file {:ok, neural} = Nasty.Statistics.POSTagging.NeuralTagger.load("path/to/model.axon") # Save model to file :ok = Nasty.Statistics.ModelLoader.save(model, "path/to/model.model") :ok = NeuralTagger.save(neural, "path/to/model.axon") # Load from project {:ok, model} = Nasty.Statistics.ModelLoader.load_from_priv("models/hmm.model") ``` ### Neural Models #### `Nasty.Statistics.POSTagging.NeuralTagger` Train and use BiLSTM-CRF neural models for POS tagging. ```elixir # Train a neural model alias Nasty.Statistics.POSTagging.NeuralTagger tagger = NeuralTagger.new( vocab: vocab, tag_vocab: tag_vocab, embedding_dim: 300, hidden_size: 256, num_layers: 2 ) {:ok, trained} = NeuralTagger.train(tagger, training_data, epochs: 10, batch_size: 32, learning_rate: 0.001 ) # Use neural model for prediction {:ok, tags} = NeuralTagger.predict(trained, ["The", "cat", "sat"], []) # Save/load neural models NeuralTagger.save(trained, "model.axon") {:ok, loaded} = NeuralTagger.load("model.axon") ``` ## Data Layer ### CoNLL-U Parser #### `Nasty.Data.CoNLLU` Parse and generate CoNLL-U format data. ```elixir # Parse CoNLL-U file {:ok, sentences} = Nasty.Data.CoNLLU.parse_file("corpus.conllu") # Parse CoNLL-U string {:ok, sentences} = Nasty.Data.CoNLLU.parse(conllu_string) # Convert AST to CoNLL-U conllu_string = Nasty.Data.CoNLLU.format(sentence) ``` ### Corpus Management #### `Nasty.Data.Corpus` Manage text corpora. ```elixir # Load corpus {:ok, corpus} = Nasty.Data.Corpus.load("path/to/corpus") # Get sentences sentences = Nasty.Data.Corpus.sentences(corpus) # Statistics stats = Nasty.Data.Corpus.statistics(corpus) ``` ## NLP Operations (English) These are language-specific operations available for English. Access through the English module. ### Question Answering ```elixir alias Nasty.Language.English # Analyze question {:ok, analysis} = English.QuestionAnalyzer.analyze("What is the capital of France?") # Extract answer {:ok, answer} = English.AnswerExtractor.extract(document, analysis) ``` ### Text Classification ```elixir # Train classifier classifier = English.TextClassifier.train(training_data) # Classify text {:ok, category} = English.TextClassifier.classify(classifier, text) ``` ### Information Extraction ```elixir # Extract relations relations = English.RelationExtractor.extract(document) # Extract events events = English.EventExtractor.extract(document) # Extract with templates extracted = English.TemplateExtractor.extract(document, templates) ``` ### Semantic Role Labeling ```elixir # Label semantic roles labeled = English.SemanticRoleLabeler.label(sentence) ``` ### Coreference Resolution ```elixir # Resolve coreferences {:ok, resolved} = English.CoreferenceResolver.resolve(document) ``` ### Translation #### `Nasty.Translation.Translator` Translate documents between languages. ```elixir alias Nasty.Translation.Translator # Translate document {:ok, translated_doc} = Translator.translate(source_doc, :es) # Translate with custom lexicons {:ok, translated_doc} = Translator.translate(source_doc, :es, lexicon_path: "custom_lexicons/") ``` #### `Nasty.Translation.TokenTranslator` Translate individual tokens with POS-aware lemma-to-lemma mapping. ```elixir alias Nasty.Translation.TokenTranslator # Translate token translated_token = TokenTranslator.translate_token(token, :en, :es) # Translate with morphology translated_token = TokenTranslator.translate_with_morphology(token, :en, :es) ``` #### `Nasty.Translation.Agreement` Enforce morphological agreement rules. ```elixir alias Nasty.Translation.Agreement # Apply gender/number agreement adjusted_tokens = Agreement.apply_agreement(tokens, :es) # Check agreement valid? = Agreement.check_agreement(determiner, noun) ``` #### `Nasty.Translation.WordOrder` Apply language-specific word order transformations. ```elixir alias Nasty.Translation.WordOrder # Transform word order ordered_phrase = WordOrder.apply_order(phrase, :es) # Apply adjective position rules ordered_np = WordOrder.apply_adjective_order(noun_phrase, :es) ``` #### `Nasty.AST.Renderer` Render AST back to natural language text. ```elixir alias Nasty.AST.Renderer # Render document {:ok, text} = Renderer.render_document(document) # Render specific nodes {:ok, text} = Renderer.render_sentence(sentence) {:ok, text} = Renderer.render_phrase(phrase) ``` ## Error Handling All public API functions return result tuples: - `{:ok, result}` on success - `{:error, reason}` on failure Common error reasons: - `:language_required` - Language not specified - `:language_not_found` - Language not registered - `:language_not_registered` - Language code not in registry - `:no_languages_registered` - No languages available - `:no_match` - Language detection failed - `:invalid_text` - Invalid input text - `:parse_error` - Failed to parse text - `:source_language_required` - Source language not specified - `:target_language_required` - Target language not specified - `:unsupported_language_pair` - Language pair not supported - `:summarization_not_supported` - Summarization not available for language - `:invalid_input` - Invalid input type ## See Also - [AST Reference](AST_REFERENCE.md) - Complete AST node documentation - [User Guide](USER_GUIDE.md) - Tutorial and examples - [Architecture](ARCHITECTURE.md) - System architecture - [Language Guide](LANGUAGE_GUIDE.md) - Adding new languages