Module em_filter_vec

em_filter_vec — Capability list → semantic f32 vector.

Authors: Steve Roques.

Description

em_filter_vec — Capability list → semantic f32 vector

Converts an agent's capability list into a unit-norm f32 binary vector suitable for cosine-similarity routing in em_pop.

Algorithm

1. Each capability string is normalised to a binary (atoms and char-lists are accepted for convenience). 2. It is hashed via erlang:phash2/2 to a slot index in [0, Dim-1]. phash2 is deterministic across BEAM restarts for the same Erlang major version, so the same capability always maps to the same slot. 3. The slot's weight is incremented by 1.0 (additive: two capabilities that collide in the same slot reinforce each other rather than cancelling). 4. The resulting weight vector is L2-normalised to unit length so that cosine similarity is always well-defined and lies in [-1, 1].

Empty capability list

A uniform unit vector is returned (all slots equal, total norm = 1). This avoids a zero vector which has no well-defined cosine.

Dimension

Default is 64 floats (256 bytes). This gives a good trade-off between precision (hash collision probability ≈ 1/64 per pair) and memory cost. Pass an explicit Dim to from_capabilities/2 when finer resolution is needed (e.g. 128 or 256).

Function Index

from_capabilities/1Derive a 64-dimension unit-norm f32 vector from a capability list.
from_capabilities/2Derive a Dim-dimension unit-norm f32 vector from a capability list.

Function Details

from_capabilities/1

from_capabilities(Caps::[binary() | atom() | string()]) -> binary()

Derive a 64-dimension unit-norm f32 vector from a capability list.

Equivalent to from_capabilities(Caps, 64).

from_capabilities/2

from_capabilities(Caps::[binary() | atom() | string()], Dim::pos_integer()) -> binary()

Derive a Dim-dimension unit-norm f32 vector from a capability list.

The output is a flat binary of Dim IEEE-754 single-precision floats in little-endian byte order, as expected by kvex.

Example:
    Vec = em_filter_vec:from_capabilities([<<"rss">>, <<"search">>], 64).
    %% Vec is a 256-byte binary, unit norm, usable directly with kvex.


Generated by EDoc