Authors: Steve Roques.
em_filter — Public API and HTML Utilities
All nodes in the Emergence system are agents. The Queen connects to em_disco the same way any other agent does.
Every handler module must export:
handle(Body :: binary(), Memory :: map()) -> {Result :: term(), NewMemory :: map()}
Memory is always a live map. Returning the same map as NewMemory is valid for stateless behaviour — no special config needed.
capabilities => [binary()] Announced to em_disco via agent_hello. Defaults to [].
memory => ram | ets ram (default): memory lives in the gen_server state and resets to #{} if the worker is restarted. ets: memory is persisted in an ETS table and survives worker restarts within the same BEAM session.| base_capabilities/0 | Returns the root capabilities shared by all em_filter agents. |
| clean_text/3 | Strips noise strings and decodes HTML entities from text. |
| decode_hex_entities/1 | |
| decode_html_entities/1 | Decodes &#N;, &#xHH;, and &name; HTML entities. |
| decode_named_entities/1 | |
| decode_numeric_entities/1 | |
| ensure_binary/1 | |
| extract_attribute/2 | Extracts the value of an attribute from an HTML element binary. |
| extract_elements/2 | Extracts HTML elements matching a CSS-style selector. |
| get_text/1 | Strips all HTML tags from a binary, returning plain text. |
| pop_gossip/1 | Trigger one synchronous gossip tick for an agent's PP node. |
| pop_node/1 | Return the em_pop node pid for an agent, or undefined. |
| pop_peers/1 | Return all Population Protocol peers known by an agent. |
| pop_peers_for/3 | Return the top-K PP peers ordered by cosine similarity. |
| pop_trust/2 | Return the PP trust score for a peer (0.0 = unknown, 1.0 = full). |
| pop_vector/1 | Return the capability vector used by an agent's PP node. |
| resolve_named_entity/1 | |
| safe_binary_replace/3 | |
| should_skip_link/2 | Returns true if the link matches any excluded pattern or does
not start with http. |
| start_agent/3 | |
| stop_agent/1 | |
| strip_scripts/1 | Removes all <script>...</script> blocks from an HTML binary. |
base_capabilities() -> [binary()]
Returns the root capabilities shared by all em_filter agents.
clean_text(D::term(), I::term(), Dt::term()) -> binary()
Strips noise strings and decodes HTML entities from text.
RemovesD, I, and Dt substrings then calls
decode_html_entities/1.
decode_hex_entities(Text::binary()) -> binary()
decode_html_entities(T::binary()) -> binary()
Decodes &#N;, &#xHH;, and &name; HTML entities.
decode_named_entities(Text::binary()) -> binary()
decode_numeric_entities(Text::binary()) -> binary()
ensure_binary(B::term()) -> binary()
extract_attribute(E::binary(), Attr::string()) -> {ok, binary()} | error
Extracts the value of an attribute from an HTML element binary.
Returns{ok, Value} or error if the attribute is absent.
extract_elements(Html::binary(), Selector::string()) -> term()
Extracts HTML elements matching a CSS-style selector.
Supported selectors:li.b_algo, div a, div p,
.algoSlug_icon, .news_dt, tag, .class, #id,
tag.class, [attr=value].
get_text(E::binary()) -> binary()
Strips all HTML tags from a binary, returning plain text.
pop_gossip(AgentName::atom()) -> ok | {error, no_pop_node}
Trigger one synchronous gossip tick for an agent's PP node.
Returns{error, no_pop_node} when the agent has no em_pop node.
pop_node(AgentName::atom()) -> pid() | undefined
Return the em_pop node pid for an agent, or undefined.
undefined when the agent was started without a pop_port
key in its Config map.
pop_peers(AgentName::atom()) -> [map()]
Return all Population Protocol peers known by an agent.
Returns[] when the agent has no em_pop node.
pop_peers_for(AgentName::atom(), QueryVec::binary(), K::pos_integer()) -> [{map(), float()}]
Return the top-K PP peers ordered by cosine similarity.
QueryVec must be an f32 little-endian binary of the same dimension as the vectors used when the agent was started (default: 64 floats).pop_trust(AgentName::atom(), PeerId::binary()) -> float()
Return the PP trust score for a peer (0.0 = unknown, 1.0 = full).
pop_vector(AgentName::atom()) -> binary() | undefined
Return the capability vector used by an agent's PP node.
Useful for building QueryVec arguments to pop_peers_for/3.resolve_named_entity(X1::binary()) -> binary() | undefined
safe_binary_replace(S::binary(), P::binary(), R::binary()) -> binary()
should_skip_link(Link::binary(), Excluded::[string()]) -> boolean()
Returns true if the link matches any excluded pattern or does
not start with http.
start_agent(AgentName::atom(), HandlerModule::module(), Config::map()) -> {ok, pid()} | {error, term()}
stop_agent(AgentName::atom()) -> ok | {error, term()}
strip_scripts(Html::binary() | string()) -> {ok, binary()} | {error, cleaning_failed}
Removes all <script>...</script> blocks from an HTML binary.
{ok, Cleaned} or {error, cleaning_failed} if the regex
operation raises an exception.
Generated by EDoc