viva_math/free_energy
Free Energy Principle (FEP) calculations.
Based on Karl Friston’s work (2010, 2019). Free Energy bounds surprise (negative log evidence) and can be decomposed as:
F = Π · (μ - o)² + D_KL(q || p) ↑ ↑ Accuracy Complexity (weighted (deviation prediction from priors) error)
In VIVA, this is used for interoception - sensing internal state and minimizing “surprise” through prediction.
References:
- Friston (2010) “The free-energy principle: a unified brain theory?”
- Parr & Friston (2019) “Generalised free energy and active inference”
- Beal (2003) “Variational Algorithms for Approximate Bayesian Inference”
- Bishop (2006) “Pattern Recognition and Machine Learning”, ch. 10
Types
Decomposition of the Evidence Lower Bound.
ELBO(q) = E_q[log p(x | z)] − D_KL(q(z) ‖ p(z))
= reconstruction − kl_divergence
Maximizing ELBO is equivalent to minimizing the variational free energy:
F = −ELBO.
pub type ELBO {
ELBO(reconstruction: Float, kl_divergence: Float, total: Float)
}
Constructors
-
ELBO(reconstruction: Float, kl_divergence: Float, total: Float)Arguments
- reconstruction
-
Expected log-likelihood under q. Higher = better data fit.
- kl_divergence
-
Divergence of posterior approximation from prior.
- total
-
reconstruction − kl_divergence. Lower bound onlog p(x).
Expected Free Energy components.
In planning, an agent selects actions that minimise G = epistemic + pragmatic. Splitting the components lets you steer exploration (epistemic) vs exploitation (pragmatic) by reweighting them.
pub type ExpectedFreeEnergy {
ExpectedFreeEnergy(
epistemic: Float,
pragmatic: Float,
total: Float,
)
}
Constructors
-
ExpectedFreeEnergy( epistemic: Float, pragmatic: Float, total: Float, )Arguments
- epistemic
-
Information gain from observing the outcome (exploration).
- pragmatic
-
Expected divergence from preferred outcomes (exploitation).
- total
-
G = epistemic + pragmatic.
Qualitative feeling based on free energy level.
pub type Feeling {
Homeostatic
Surprised
Alarmed
Overwhelmed
}
Constructors
-
HomeostaticLow free energy - predictions match reality (F < μ - σ)
-
SurprisedModerate free energy - slight mismatch (μ - σ ≤ F < μ)
-
AlarmedHigh free energy - significant mismatch (μ ≤ F < μ + σ)
-
OverwhelmedVery high free energy - system overwhelmed (F ≥ μ + σ)
Thresholds for feeling classification. Based on system-specific statistics (mean and standard deviation).
pub type FeelingThresholds {
FeelingThresholds(mean: Float, std_dev: Float)
}
Constructors
-
FeelingThresholds(mean: Float, std_dev: Float)Arguments
- mean
-
Mean free energy (baseline)
- std_dev
-
Standard deviation of free energy
Free Energy state for a system.
pub type FreeEnergyState {
FreeEnergyState(
free_energy: Float,
prediction_error: Float,
complexity: Float,
precision: Float,
feeling: Feeling,
)
}
Constructors
-
FreeEnergyState( free_energy: Float, prediction_error: Float, complexity: Float, precision: Float, feeling: Feeling, )Arguments
- free_energy
-
The free energy value (lower is better)
- prediction_error
-
Prediction error component (precision-weighted)
- complexity
-
Complexity/KL divergence component
- precision
-
Precision used for weighting
- feeling
-
Qualitative feeling based on normalized free energy
Posterior over a Gaussian belief: mean and precision (inverse variance).
BPC tracks the full posterior over hidden states instead of just MAP estimates. Closed-form Hebbian updates (Vasilescu & Friston 2025, arXiv:2503.24016) preserve the locality of PC while quantifying epistemic uncertainty.
pub type GaussianBelief {
GaussianBelief(mean: vector.Vec3, precision: Float)
}
Constructors
-
GaussianBelief(mean: vector.Vec3, precision: Float)
A hierarchical predictive coding network: a stack of layers from sensory (head) to abstract (tail). Used for active inference planning at multiple scales (S-HAI 2026).
pub type Hierarchical {
Hierarchical(layers: List(HierarchicalLayer))
}
Constructors
-
Hierarchical(layers: List(HierarchicalLayer))
A single layer of a hierarchical predictive-coding network.
Stores the layer’s state estimate mu, the precision (inverse variance)
of its prediction errors, and the precision of the prior over the layer
state. Higher layers send top-down predictions; bottom-up prediction
errors travel upward. See Friston (2010), Bogacz (2017), and the 2026
Meta-PCN framework for the modern formulation.
pub type HierarchicalLayer {
HierarchicalLayer(
mu: vector.Vec3,
precision: Float,
prior_precision: Float,
)
}
Constructors
-
HierarchicalLayer( mu: vector.Vec3, precision: Float, prior_precision: Float, )Arguments
- mu
-
Posterior mean at this layer (the latent state estimate).
- precision
-
Precision of prediction errors flowing up from this layer.
- prior_precision
-
Precision of the prior over this layer’s state.
Mean-field Gaussian variational posterior q(z) ~ N(q_mean, q_var).
pub type MeanFieldParams {
MeanFieldParams(q_mean: Float, q_var: Float)
}
Constructors
-
MeanFieldParams(q_mean: Float, q_var: Float)
Values
pub fn active_inference_delta(
current: vector.Vec3,
target: vector.Vec3,
rate: Float,
) -> vector.Vec3
Active Inference: compute action that minimizes expected free energy.
This returns the delta to apply to current state to move toward target. Rate controls how quickly to move (0 = no movement, 1 = instant).
pub fn belief_update(
prior: Float,
observation: Float,
precision_prior: Float,
precision_likelihood: Float,
) -> Float
Bayesian belief update: combine prior with likelihood.
posterior ∝ likelihood × prior Using precision-weighted combination: new_belief = (Π_prior × prior + Π_likelihood × observation) / (Π_prior + Π_likelihood)
pub fn bpc_precision_update(
current_precision: Float,
error_squared: Float,
observation_count: Int,
) -> Float
Hebbian variance update: the BPC weight-rule equivalent of synaptic plasticity. Updates precision based on prediction-error magnitude.
new_precision = (count · prior_precision + 1) / (count · variance + |error|²)
Higher errors → lower precision; consistent observations → higher precision. Equivalent to a conjugate Normal-Gamma update.
pub fn bpc_update(
prior: GaussianBelief,
observation: vector.Vec3,
likelihood_precision: Float,
) -> GaussianBelief
Precision-weighted Bayesian update for a Gaussian belief from a single observation under Gaussian likelihood.
posterior_precision = prior_precision + likelihood_precision posterior_mean = (prior_precision · prior_mean + likelihood_precision · observation) / posterior_precision
Returns the new belief. This is the closed-form variant central to BPC.
pub fn classify_feeling(free_energy: Float) -> Feeling
Legacy classify_feeling with fixed thresholds. Calibrated for PAD space (max distance ~3.46).
pub fn classify_feeling_normalized(
free_energy: Float,
thresholds: FeelingThresholds,
) -> Feeling
Classify feeling using normalized thresholds.
- Homeostatic: F < μ - σ (better than expected)
- Surprised: μ - σ ≤ F < μ (slightly worse)
- Alarmed: μ ≤ F < μ + σ (worse than average)
- Overwhelmed: F ≥ μ + σ (much worse)
pub fn complexity(
current: vector.Vec3,
baseline: vector.Vec3,
prior_variance: Float,
) -> Float
Compute complexity term using KL divergence.
Complexity = D_KL(q(θ) || p(θ))
Where q is posterior belief and p is prior belief (homeostatic setpoint). Weight controls the regularization strength.
pub fn complexity_weighted(
current: vector.Vec3,
baseline: vector.Vec3,
weight: Float,
) -> Float
Legacy complexity function for backwards compatibility.
pub fn compute_state(
expected: vector.Vec3,
actual: vector.Vec3,
baseline: vector.Vec3,
precision: Float,
prior_variance: Float,
thresholds: FeelingThresholds,
) -> FreeEnergyState
Compute free energy and return full state with feeling. Uses normalized thresholds for feeling classification.
pub fn compute_state_simple(
expected: vector.Vec3,
actual: vector.Vec3,
baseline: vector.Vec3,
complexity_weight: Float,
) -> FreeEnergyState
Simplified compute_state with default thresholds and legacy interface. For backwards compatibility.
pub fn default_thresholds() -> FeelingThresholds
Default thresholds calibrated for PAD space. Mean and std_dev derived from typical emotional dynamics.
pub fn elbo(
observation: Float,
q_mean: Float,
q_var: Float,
prior_mean: Float,
prior_var: Float,
likelihood_var: Float,
) -> ELBO
Closed-form ELBO for a Gaussian latent model:
- Prior:
p(z) = N(prior_mean, prior_var) - Likelihood:
p(x|z) = N(z, likelihood_var) - Posterior approx:
q(z) = N(q_mean, q_var)
Reconstruction term (expected log-likelihood under q):
E_q[log p(x|z)] = −½·log(2π·likelihood_var) − ((x − q_mean)² + q_var) / (2·likelihood_var)
KL term is gaussian_kl_divergence_full(q_mean, q_var, prior_mean, prior_var).
pub fn estimate_precision(errors: List(Float)) -> Float
Estimate precision from recent prediction errors.
Precision = 1 / variance of errors Higher precision means more reliable predictions.
pub fn expected_free_energy(
predicted_outcome: vector.Vec3,
preferred_outcome: vector.Vec3,
predictive_uncertainty: Float,
) -> ExpectedFreeEnergy
Decompose Expected Free Energy.
predicted_outcome: agent’s expectation of the future state under action a.preferred_outcome: agent’s goal state (homeostatic setpoint).predictive_uncertainty: entropy of the predictive distribution (epistemic).
pub fn free_energy(
expected: vector.Vec3,
actual: vector.Vec3,
baseline: vector.Vec3,
precision: Float,
prior_variance: Float,
) -> Float
Compute full Free Energy: F = Π·(μ-o)² + D_KL(q||p)
Parameters
- expected: predicted/expected state (μ)
- actual: observed/actual state (o)
- baseline: prior baseline state (p) - e.g., personality/homeostatic setpoint
- precision: inverse variance of predictions (Π)
- prior_variance: variance of prior beliefs (for KL term)
pub fn gaussian_kl_divergence(
posterior_mean: vector.Vec3,
prior_mean: vector.Vec3,
variance: Float,
) -> Float
Compute KL divergence between Gaussian distributions (closed form).
CORRECTED per DeepSeek R1 validation - Full KL for Gaussians: D_KL(N(μ₁,σ₁²) || N(μ₂,σ₂²)) = (μ₁ - μ₂)²/(2σ₂²) + (σ₁² - σ₂²)/(2σ₂²) - 1/2
When variances are equal (σ₁ = σ₂), reduces to: (μ₁ - μ₂)²/(2σ²)
This measures how much the posterior (current belief) diverges from prior.
pub fn gaussian_kl_divergence_full(
posterior_mean: vector.Vec3,
prior_mean: vector.Vec3,
posterior_variance: Float,
prior_variance: Float,
) -> Float
Full KL divergence between multivariate isotropic Gaussians with different (scalar) variances in d=3 dimensions (Vec3).
D_KL(N(μ₁, σ₁² I_d) || N(μ₂, σ₂² I_d)) = (d/2) · ln(σ₂² / σ₁²) + (d·σ₁² + |μ₁-μ₂|²) / (2σ₂²) - d/2
For d=1 and equal variances this reduces to (μ₁-μ₂)² / (2σ²), matching
gaussian_kl_divergence/3.
pub fn generalized_free_energy(
expected_state: vector.Vec3,
preferred_state: vector.Vec3,
uncertainty: Float,
) -> Float
Generalized Free Energy (expected free energy for planning).
G = ambiguity + risk
- ambiguity: expected surprise under model (epistemic value)
- risk: KL divergence from preferred outcomes (pragmatic value)
Used for action selection in active inference.
pub fn hierarchical_errors(h: Hierarchical) -> List(vector.Vec3)
Per-layer prediction error: e_l = mu_l - g(mu_{l+1}).
In the simplest linear PC model, g is the identity. For richer models
pass a custom decoder via hierarchical_errors_with.
pub fn hierarchical_errors_with(
h: Hierarchical,
decoder: fn(vector.Vec3) -> vector.Vec3,
) -> List(vector.Vec3)
Hierarchical prediction errors with custom top-down decoder.
pub fn hierarchical_free_energy(h: Hierarchical) -> Float
Hierarchical free energy summed across layers.
F_total = Σ_l Π_l · |e_l|² where e_l is the prediction error between layer l and the top-down prediction from layer l+1. This is the variant Meta-PCN (ICLR 2026) regularises with weight-variance normalisation to avoid exploding errors in deep networks.
pub fn hierarchical_infer(
h: Hierarchical,
lr: Float,
n: Int,
) -> Hierarchical
Run n inference steps. Convenience wrapper around
hierarchical_inference_step.
pub fn hierarchical_inference_step(
h: Hierarchical,
lr: Float,
) -> Hierarchical
One inference step of gradient descent on the hierarchical free energy.
For each non-top layer l, updates the latent state μ_l along the descent
direction -∂F/∂μ_l, where:
∂F/∂μ_l = Π_l · (μ_l - μ_{l+1}) + Π_{l-1} · (μ_l - μ_{l-1})
↑ ↑
top-down prior fit bottom-up evidence fit
lr is the learning rate (step size); typical values 0.01–0.1 for stable
inference. The bottom layer’s μ is left untouched — it represents the
sensory observation and is fixed during inference.
pub fn laplace_approximation(
log_posterior: fn(Float) -> Float,
initial_guess: Float,
step_size: Float,
n_steps: Int,
) -> Result(MeanFieldParams, Nil)
Laplace approximation: fit a Gaussian to the mode of a smooth
log-posterior. The mean is the MAP estimate (found by gradient ascent),
the variance is −1 / f''(mode) via central finite differences.
log_posterior should return log p(z | x) up to an additive constant
(the normaliser cancels in the gradient).
step_size— gradient step for finding the mode.n_steps— gradient iterations. ReturnsError(Nil)if the second derivative at the mode is non-negative (no valid Gaussian fit).
pub fn log_evidence_gaussian(
observation: Float,
prior_mean: Float,
prior_var: Float,
likelihood_var: Float,
) -> Float
Log marginal likelihood (model evidence) for the Gaussian-Gaussian model.
Marginal: p(x) = ∫ p(x|z) p(z) dz = N(x; prior_mean, prior_var + likelihood_var).
Returns log p(x) directly (closed form). Useful as the gold-standard
reference value that ELBO bounds from below.
pub fn mean_field_iterate(
batches: List(List(Float)),
prior: MeanFieldParams,
likelihood_var: Float,
) -> Result(MeanFieldParams, Nil)
Iterated mean-field — for non-conjugate models the update would be re-applied with refreshed likelihood statistics. Here, since the conjugate case is one-shot, this iterates over a sequence of observation batches, using each posterior as the next prior (sequential Bayes).
Returns the final MeanFieldParams after consuming all batches.
pub fn mean_field_update(
observations: List(Float),
prior_mean: Float,
prior_var: Float,
likelihood_var: Float,
) -> Result(MeanFieldParams, Nil)
Mean-field update under a Gaussian prior and Gaussian likelihood with known variances. Closed form — no iteration needed (the model is conjugate). Bishop §2.3.3.
posterior_precision = prior_precision + n · likelihood_precision
posterior_mean = posterior_var · (prior_precision · prior_mean
+ likelihood_precision · sum_x)
Returns Error(Nil) if either variance is non-positive.
pub fn meta_prediction_errors(
h: Hierarchical,
) -> List(vector.Vec3)
Meta-prediction error: prediction error of the prediction error.
Meta-PCN (Lin et al. ICLR 2026) shows that minimising “PEs of PEs” linearises the otherwise non-linear PCN equilibrium dynamics, yielding dramatically more stable inference at depth.
meta_e_l = e_l - h(e_{l+1}) where h is typically identity for the simplest case.
pub fn policy_posterior(
policies: List(#(a, vector.Vec3, Float)),
preferred_outcome: vector.Vec3,
beta: Float,
) -> List(#(a, Float))
Softmax over policies: probability of selecting each action given its Expected Free Energy. Lower G → higher probability (β controls sharpness).
pub fn precision_weighted_error_vec(
expected: vector.Vec3,
actual: vector.Vec3,
precisions: vector.Vec3,
) -> Float
Precision-weighted prediction error for Vec3.
Each dimension can have different precision. Returns weighted sum of squared errors.
pub fn precision_weighted_prediction_error(
expected: vector.Vec3,
actual: vector.Vec3,
precision: Float,
) -> Float
Compute precision-weighted prediction error.
F_accuracy = Π · (expected - actual)²
Precision (Π) = 1/variance. Higher precision = more weight on prediction errors. This is critical for biological systems where uncertainty should attenuate errors.
pub fn prediction_error(
expected: vector.Vec3,
actual: vector.Vec3,
) -> Float
Compute raw prediction error between expected and actual state. Uses squared Euclidean distance (L2 loss).
pub fn scalar_gaussian_kl(
q_mean: Float,
q_var: Float,
p_mean: Float,
p_var: Float,
) -> Float
1D Gaussian KL — D_KL(N(μ₁, σ₁²) ‖ N(μ₂, σ₂²)).
= ½ · ( ln(σ₂² / σ₁²) + (σ₁² + (μ₁−μ₂)²) / σ₂² − 1 ).
Returns 0.0 if either variance is non-positive. Public so callers can
reuse it (the existing gaussian_kl_divergence_* functions are Vec3-only).
pub fn select_policy(
policies: List(#(a, vector.Vec3, Float)),
preferred_outcome: vector.Vec3,
) -> Result(#(a, ExpectedFreeEnergy), Nil)
Select the action with minimum Expected Free Energy.
policies is a list of (action_label, predicted_outcome, predictive_uncertainty).
Returns the best policy or Error(Nil) if the list is empty.
pub fn surprise(
expected: Float,
observed: Float,
sigma: Float,
) -> Float
Compute surprise for a single dimension.
Surprise = -log(p(observation | model)) Using Gaussian approximation: surprise ∝ (x - μ)² / (2σ²)
pub fn update_thresholds(
current: FeelingThresholds,
observed_fe: Float,
alpha: Float,
) -> FeelingThresholds
Update thresholds based on observed free energy history. Uses exponential moving average for online learning.
pub fn variational_bound(
observation_likelihood: Float,
kl_divergence: Float,
) -> Float
Variational Free Energy bound.
F ≤ -log p(o) + D_KL(q||p)
The free energy bounds the negative log evidence (surprise).