Module em_filter

em_filter — Public API and HTML Utilities.

Authors: Steve Roques.

Description

em_filter — Public API and HTML Utilities

All nodes in the Emergence system are agents. The Queen connects to em_disco the same way any other agent does.

Handler contract

Every handler module must export:

handle(Body :: binary(), Memory :: map()) -> {Result :: term(), NewMemory :: map()}

Memory is always a live map. Returning the same map as NewMemory is valid for stateless behaviour — no special config needed.

Config map keys (all optional)

capabilities => [binary()] Announced to em_disco via agent_hello. Defaults to [].

memory => ram | ets ram (default): memory lives in the gen_server state and resets to #{} if the worker is restarted. ets: memory is persisted in an ETS table and survives worker restarts within the same BEAM session.

Function Index

base_capabilities/0Returns the root capabilities shared by all em_filter agents.
clean_text/3Strips noise strings and decodes HTML entities from text.
decode_hex_entities/1
decode_html_entities/1Decodes &#N;, &#xHH;, and &name; HTML entities.
decode_named_entities/1
decode_numeric_entities/1
ensure_binary/1
extract_attribute/2Extracts the value of an attribute from an HTML element binary.
extract_elements/2Extracts HTML elements matching a CSS-style selector.
get_text/1Strips all HTML tags from a binary, returning plain text.
pop_gossip/1Trigger one synchronous gossip tick for an agent's PP node.
pop_node/1Return the em_pop node pid for an agent, or undefined.
pop_peers/1Return all Population Protocol peers known by an agent.
pop_peers_for/3Return the top-K PP peers ordered by cosine similarity.
pop_trust/2Return the PP trust score for a peer (0.0 = unknown, 1.0 = full).
pop_vector/1Return the capability vector used by an agent's PP node.
resolve_named_entity/1
safe_binary_replace/3
should_skip_link/2Returns true if the link matches any excluded pattern or does not start with http.
start_agent/3
stop_agent/1
strip_scripts/1Removes all <script>...</script> blocks from an HTML binary.

Function Details

base_capabilities/0

base_capabilities() -> [binary()]

Returns the root capabilities shared by all em_filter agents.

clean_text/3

clean_text(D::term(), I::term(), Dt::term()) -> binary()

Strips noise strings and decodes HTML entities from text.

Removes D, I, and Dt substrings then calls decode_html_entities/1.

decode_hex_entities/1

decode_hex_entities(Text::binary()) -> binary()

decode_html_entities/1

decode_html_entities(T::binary()) -> binary()

Decodes &#N;, &#xHH;, and &name; HTML entities.

decode_named_entities/1

decode_named_entities(Text::binary()) -> binary()

decode_numeric_entities/1

decode_numeric_entities(Text::binary()) -> binary()

ensure_binary/1

ensure_binary(B::term()) -> binary()

extract_attribute/2

extract_attribute(E::binary(), Attr::string()) -> {ok, binary()} | error

Extracts the value of an attribute from an HTML element binary.

Returns {ok, Value} or error if the attribute is absent.

extract_elements/2

extract_elements(Html::binary(), Selector::string()) -> term()

Extracts HTML elements matching a CSS-style selector.

Supported selectors: li.b_algo, div a, div p, .algoSlug_icon, .news_dt, tag, .class, #id, tag.class, [attr=value].

get_text/1

get_text(E::binary()) -> binary()

Strips all HTML tags from a binary, returning plain text.

pop_gossip/1

pop_gossip(AgentName::atom()) -> ok | {error, no_pop_node}

Trigger one synchronous gossip tick for an agent's PP node.

Returns {error, no_pop_node} when the agent has no em_pop node.

pop_node/1

pop_node(AgentName::atom()) -> pid() | undefined

Return the em_pop node pid for an agent, or undefined.

Returns undefined when the agent was started without a pop_port key in its Config map.

pop_peers/1

pop_peers(AgentName::atom()) -> [map()]

Return all Population Protocol peers known by an agent.

Returns [] when the agent has no em_pop node.

pop_peers_for/3

pop_peers_for(AgentName::atom(), QueryVec::binary(), K::pos_integer()) -> [{map(), float()}]

Return the top-K PP peers ordered by cosine similarity.

QueryVec must be an f32 little-endian binary of the same dimension as the vectors used when the agent was started (default: 64 floats).

pop_trust/2

pop_trust(AgentName::atom(), PeerId::binary()) -> float()

Return the PP trust score for a peer (0.0 = unknown, 1.0 = full).

pop_vector/1

pop_vector(AgentName::atom()) -> binary() | undefined

Return the capability vector used by an agent's PP node.

Useful for building QueryVec arguments to pop_peers_for/3.

resolve_named_entity/1

resolve_named_entity(X1::binary()) -> binary() | undefined

safe_binary_replace/3

safe_binary_replace(S::binary(), P::binary(), R::binary()) -> binary()

should_skip_link/2

should_skip_link(Link::binary(), Excluded::[string()]) -> boolean()

Returns true if the link matches any excluded pattern or does not start with http.

start_agent/3

start_agent(AgentName::atom(), HandlerModule::module(), Config::map()) -> {ok, pid()} | {error, term()}

stop_agent/1

stop_agent(AgentName::atom()) -> ok | {error, term()}

strip_scripts/1

strip_scripts(Html::binary() | string()) -> {ok, binary()} | {error, cleaning_failed}

Removes all <script>...</script> blocks from an HTML binary.

Returns {ok, Cleaned} or {error, cleaning_failed} if the regex operation raises an exception.


Generated by EDoc