# Reading Edifice > How to understand and use the code patterns in this library -- Axon computation graphs, the build API, tensor shapes, and running inference. ## What This Guide Covers Every architecture in Edifice follows the same patterns. Once you understand these patterns, you can pick up any of the 90+ architectures without re-learning the API. This guide walks through those patterns with runnable examples. **Prerequisites:** You should be comfortable with the concepts in [ML Foundations](ml_foundations.md) and [Core Vocabulary](core_vocabulary.md). Familiarity with basic Elixir syntax is helpful but not strictly required -- the patterns are simple enough to follow even if you're new to the language. ## The Stack: Nx, Axon, and Edifice Edifice sits on top of two foundational Elixir libraries: ``` ┌─────────────────────────────────────┐ │ Edifice │ 90+ architectures, consistent API │ "What architecture do I want?" │ ├─────────────────────────────────────┤ │ Axon │ Model building, computation graphs │ "How do layers connect?" │ ├─────────────────────────────────────┤ │ Nx │ Numerical computing, tensors, autograd │ "How do I do math on tensors?" │ ├─────────────────────────────────────┤ │ EXLA (optional) │ GPU acceleration via XLA compiler │ "Make it fast on GPU" │ └─────────────────────────────────────┘ ``` **Nx** is Elixir's numerical computing library. It provides tensors (multi-dimensional arrays), mathematical operations, and automatic differentiation. Think of it as Elixir's equivalent of NumPy + autograd. **Axon** builds on Nx to provide a model-building API. You define a neural network as a **computation graph** -- a description of how data flows through layers. The graph is then compiled into efficient functions for initialization and prediction. **Edifice** uses Axon to implement 90+ architectures with a consistent API. Instead of manually wiring up attention heads, SSM blocks, and normalization layers, you call `Edifice.build/2` and get a ready-to-use Axon model. ## The Build Pattern Every architecture module in Edifice has a `build/1` function that returns an Axon model: ```elixir # The universal pattern model = SomeModule.build(option1: value1, option2: value2) ``` The model isn't a trained network -- it's a **computation graph** that describes the network's structure. No weights exist yet. No computation has happened. It's a blueprint. ### Building by Module You can use any architecture module directly: ```elixir # Simple feedforward network model = Edifice.Feedforward.MLP.build(input_size: 256, hidden_sizes: [512, 256]) # Mamba state space model model = Edifice.SSM.Mamba.build( embed_size: 128, hidden_size: 256, state_size: 16, num_layers: 4, window_size: 60 ) # Graph convolutional network for classification model = Edifice.Graph.GCN.build_classifier( input_dim: 16, hidden_dims: [64, 64], num_classes: 2, pool: :mean ) ``` ### Building by Name (Registry) The unified registry lets you build any architecture with an atom name: ```elixir # Same Mamba model, built through the registry model = Edifice.build(:mamba, embed_size: 128, hidden_size: 256, state_size: 16, num_layers: 4, window_size: 60 ) # Useful for config-driven experiments arch_name = :retnet # could come from a config file model = Edifice.build(arch_name, embed_size: 256, hidden_size: 512, num_layers: 4) ``` You can explore what's available: ```elixir # List all 90+ architecture names Edifice.list_architectures() # => [:adapter, :ann2snn, :attention, :barlow_twins, :bayesian, :bimamba, ...] # See architectures grouped by family Edifice.list_families() # => %{ # ssm: [:mamba, :mamba_ssd, :s4, :s4d, :s5, :h3, :hyena, ...], # attention: [:attention, :retnet, :rwkv, :gla, :hgrn, ...], # feedforward: [:mlp, :kan, :tabnet], # ... # } # Get the module behind a name Edifice.module_for(:mamba) # => Edifice.SSM.Mamba ``` ## From Graph to Functions: Axon.build An Axon model is just a graph. To actually run it, you compile it with `Axon.build/1`: ```elixir model = Edifice.Feedforward.MLP.build(input_size: 10, hidden_sizes: [64, 32]) # Compile the graph into two functions {init_fn, predict_fn} = Axon.build(model) ``` This gives you two functions: - **`init_fn`**: creates the initial (random) parameters - **`predict_fn`**: runs the forward pass ### Initializing Parameters `init_fn` takes a **template** (a tensor describing the expected input shape) and an empty model state: ```elixir # Template: 1 sample, 10 features -- matches input_size: 10 template = Nx.template({1, 10}, :f32) # Create random initial parameters params = init_fn.(template, Axon.ModelState.empty()) ``` The template doesn't contain real data -- it just tells Axon the shape and type of inputs to expect so it can create parameters of the right sizes. `Nx.template/2` creates a placeholder that takes no memory. `params` is now an `Axon.ModelState` containing all the network's weights and biases, randomly initialized. For a 2-layer MLP with sizes [64, 32], this includes: - Layer 0: a {10, 64} weight matrix + a {64} bias vector - Layer 1: a {64, 32} weight matrix + a {32} bias vector ### Running Inference `predict_fn` takes parameters and input data, and runs the forward pass: ```elixir # Create some input data: 4 samples, 10 features each input = Nx.broadcast(0.5, {4, 10}) # Run the forward pass output = predict_fn.(params, input) # => a tensor of shape {4, 32} (4 samples, 32 features from the last hidden layer) ``` That's it. Three steps: **build** the graph, **init** the parameters, **predict** with data. ## Understanding Tensor Shapes Shapes are how you reason about what's happening inside a network. Every Edifice architecture documents its expected input and output shapes. ### Common Shape Patterns ``` {batch_size, features} Used by: MLP, classification heads, pooled outputs Example: {32, 256} = 32 samples, 256 features each {batch_size, seq_len, features} Used by: Sequence models (Mamba, attention, recurrent, TCN) Example: {1, 60, 128} = 1 sample, 60 timesteps, 128 features per step {batch_size, height, width, channels} Used by: Vision models (ViT, ResNet, UNet) Example: {16, 224, 224, 3} = 16 RGB images at 224x224 Map with named inputs Used by: Graph models (GCN, GAT) Example: %{"nodes" => {4, 10, 16}, "adjacency" => {4, 10, 10}} ``` ### The Batch Dimension The first dimension is **always** the batch size. When you see `{nil, 60, 128}` in an Axon input specification, `nil` means "any batch size." The network doesn't care how many samples you feed it at once. ```elixir # These all work with the same model: predict_fn.(params, Nx.broadcast(0.5, {1, 60, 128})) # 1 sample predict_fn.(params, Nx.broadcast(0.5, {32, 60, 128})) # 32 samples predict_fn.(params, Nx.broadcast(0.5, {256, 60, 128})) # 256 samples ``` ### Shape Transformations Most Edifice sequence models output `{batch, hidden_size}` -- they reduce the sequence dimension by taking the last timestep or pooling. This is because the common use case is classification or regression from sequences, where you need a fixed-size output regardless of sequence length. ```elixir # Mamba: sequence in, fixed vector out model = Edifice.build(:mamba, embed_size: 128, hidden_size: 256, num_layers: 2, window_size: 60) {init_fn, predict_fn} = Axon.build(model) params = init_fn.(Nx.template({1, 60, 128}, :f32), Axon.ModelState.empty()) output = predict_fn.(params, Nx.broadcast(0.5, {1, 60, 128})) # output shape: {1, 256} -- the 60 timesteps have been reduced to a single vector ``` ## Generative Models: The Tuple Pattern Most architectures return a single Axon model. Generative architectures return **tuples** of models because they have multiple components that are trained differently: ```elixir # VAE returns an encoder and a decoder {encoder, decoder} = Edifice.Generative.VAE.build( input_size: 784, latent_size: 32, encoder_sizes: [512, 256], decoder_sizes: [256, 512] ) # Each is a separate Axon model {enc_init, enc_predict} = Axon.build(encoder) {dec_init, dec_predict} = Axon.build(decoder) # GAN returns a generator and a discriminator {generator, discriminator} = Edifice.Generative.GAN.build( latent_size: 128, output_size: 784, gen_sizes: [256, 512], disc_sizes: [512, 256] ) ``` Generative modules also provide associated utility functions for training: ```elixir # VAE: reparameterization trick and KL divergence z = Edifice.Generative.VAE.reparameterize(mu, log_var) kl_loss = Edifice.Generative.VAE.kl_divergence(mu, log_var) ``` ## Graph Models: Map Inputs Graph models expect **maps** as input because graphs have multiple components (nodes, edges, adjacency matrices): ```elixir model = Edifice.Graph.GCN.build_classifier( input_dim: 16, hidden_dims: [64, 64], num_classes: 2, pool: :mean ) {init_fn, predict_fn} = Axon.build(model) # Graph input is a map with named tensors input = %{ "nodes" => Nx.broadcast(0.5, {4, 10, 16}), # 4 graphs, 10 nodes, 16 features "adjacency" => Nx.eye(10) |> Nx.broadcast({4, 10, 10}) # adjacency matrices } params = init_fn.( %{ "nodes" => Nx.template({4, 10, 16}, :f32), "adjacency" => Nx.template({4, 10, 10}, :f32) }, Axon.ModelState.empty() ) output = predict_fn.(params, input) # output shape: {4, 2} -- 4 graphs, 2 class probabilities each ``` ## Common Options Across Architectures While each architecture has unique options, several appear across many modules: | Option | Meaning | Typical Values | |--------|---------|----------------| | `embed_size` | Input feature dimension per token | 64, 128, 256, 512 | | `hidden_size` | Internal representation width | 128, 256, 512, 1024 | | `num_layers` | Depth of the network (stacked blocks) | 2, 4, 6, 8, 12 | | `num_heads` | Number of attention heads | 4, 8, 16 | | `window_size` | Expected sequence length | 60, 128, 512, 1024 | | `dropout` | Dropout rate for regularization | 0.0, 0.1, 0.2 | | `activation` | Activation function | `:relu`, `:silu`, `:gelu` | **Larger values = more capacity** (can learn more complex patterns) but also **more compute and more data needed** to train effectively. ## Putting It All Together: A Complete Example Here's a full example showing the lifecycle from architecture selection to inference: ```elixir # 1. Choose an architecture for sequence classification # We have 60-frame game state sequences with 128 features per frame # and want to classify into 5 actions model = Edifice.build(:mamba, embed_size: 128, hidden_size: 256, state_size: 16, num_layers: 4, window_size: 60 ) # 2. Add a classification head on top # Edifice models output a feature vector; we need class probabilities classifier = model |> Axon.dense(5, name: "action_head") |> Axon.activation(:softmax) # 3. Compile the full model {init_fn, predict_fn} = Axon.build(classifier) # 4. Initialize parameters template = Nx.template({1, 60, 128}, :f32) params = init_fn.(template, Axon.ModelState.empty()) # 5. Run inference on a batch of game states game_states = Nx.broadcast(0.5, {8, 60, 128}) # 8 sequences of 60 frames predictions = predict_fn.(params, game_states) # predictions shape: {8, 5} -- probability distribution over 5 actions for each sequence ``` Notice step 2: Edifice models are composable Axon graphs. You can pipe them into additional layers, combine multiple Edifice models, or use Edifice layers as components in a larger architecture. This composability is fundamental to the design. ## Comparing Architectures Because every architecture follows the same API, swapping one for another is trivial: ```elixir # Try several sequence models with the same input/output contract architectures = [ {:mamba, [embed_size: 128, hidden_size: 256, num_layers: 4, window_size: 60]}, {:retnet, [embed_size: 128, hidden_size: 256, num_layers: 4, num_heads: 4, window_size: 60]}, {:lstm, [embed_size: 128, hidden_size: 256, num_layers: 4, window_size: 60]}, {:griffin, [embed_size: 128, hidden_size: 256, num_layers: 4, window_size: 60]} ] for {name, opts} <- architectures do model = Edifice.build(name, opts) {init_fn, predict_fn} = Axon.build(model) params = init_fn.(Nx.template({1, 60, 128}, :f32), Axon.ModelState.empty()) output = predict_fn.(params, Nx.broadcast(0.5, {1, 60, 128})) IO.puts("#{name}: output shape #{inspect(Nx.shape(output))}") end ``` This is one of Edifice's core value propositions: the cost of trying a different architecture is a one-line change. ## Reading Architecture Moduledocs Every module in Edifice includes documentation you can access in IEx: ```elixir # In IEx h Edifice.SSM.Mamba # Module overview h Edifice.SSM.Mamba.build # Build function options and return type ``` The moduledocs follow a consistent pattern: 1. One-line description of the architecture 2. ASCII diagram of the computation flow 3. Options with types and defaults 4. Usage examples with shapes annotated ## What's Next With the API patterns understood, you're ready to explore architectures: 1. **[Learning Path](learning_path.md)** -- a guided tour through the 19 families in a logical order 2. Any architecture-specific guide (e.g., [State Space Models](state_space_models.md), [Attention Mechanisms](attention_mechanisms.md)) -- you now have the vocabulary and API knowledge to follow them