Module em_filter

em_filter - Library for registering Emergence filters with data aggregation.

Authors: Steve Roques.

Description

em_filter - Library for registering Emergence filters with data aggregation

This module provides functions for: - Finding an available port for a filter service - Registering a filter with a discovery service - Processing HTML content with multiple data types - Aggregating different content types (text, links, images, etc.)

Data Types

aggregated_content()

aggregated_content() = [content_block()]

content_block()

content_block() = #{type => content_type(), data => term(), metadata => map(), position => integer()}

content_type()

content_type() = text | link | image | video | audio | mixed

filter_url()

filter_url() = string()

port_number()

port_number() = 1..65535

Function Index

aggregate_data/2Aggregates data based on specified options.
classify_content/1Classifies content type based on HTML element.
clean_text/3
decode_hex_entities/1
decode_html_entities/1
decode_named_entities/1
decode_numeric_entities/1
ensure_binary/1
extract_attribute/2
extract_content_blocks/1Extracts and classifies different content blocks from HTML.
extract_elements/2
extract_images/1Extracts image elements with metadata.
extract_links_with_text/1Extracts links with associated text.
extract_media_content/1Extracts media content (video, audio).
extract_text_blocks/1Extracts text blocks from various HTML elements.
find_port/0Finds an available TCP port for the filter service.
format_aggregated_data/1Formats aggregated data for output.
get_filter_port/1Gets the port number for a running filter.
get_text/1
merge_content_types/1Merges different content types into a unified structure.
parse_string/1
register_filter/1Registers a filter with the discovery service.
resolve_named_entity/1
safe_binary_replace/3
should_skip_link/2
start_filter/2Starts a filter service with the given name and handler module.
stop_filter/1Stops a running filter service.

Function Details

aggregate_data/2

aggregate_data(Html::binary(), Options::map()) -> aggregated_content()

Html: HTML content
Options: Aggregation options map

returns: Aggregated content

Aggregates data based on specified options.

classify_content/1

classify_content(Element::binary()) -> content_type()

Element: HTML element as binary

returns: Content type atom

Classifies content type based on HTML element.

clean_text/3

clean_text(DescText0, IconText0, DtText0) -> any()

decode_hex_entities/1

decode_hex_entities(Text) -> any()

decode_html_entities/1

decode_html_entities(Text) -> any()

decode_named_entities/1

decode_named_entities(Text) -> any()

decode_numeric_entities/1

decode_numeric_entities(Text) -> any()

ensure_binary/1

ensure_binary(Text) -> any()

extract_attribute/2

extract_attribute(Element, Attribute) -> any()

extract_content_blocks/1

extract_content_blocks(Html::binary()) -> {ok, aggregated_content()} | {error, term()}

Html: HTML content as binary

returns: {ok, AggregatedContent} or {error, Reason}

Extracts and classifies different content blocks from HTML.

extract_elements/2

extract_elements(Html, Selector) -> any()

extract_images/1

extract_images(Html::binary()) -> [content_block()]

Html: HTML content

returns: List of image content blocks

Extracts image elements with metadata.

extract_links_with_text/1

extract_links_with_text(Html::binary()) -> [content_block()]

Html: HTML content

returns: List of link content blocks

Extracts links with associated text.

extract_media_content/1

extract_media_content(Html::binary()) -> [content_block()]

Html: HTML content

returns: List of media content blocks

Extracts media content (video, audio).

extract_text_blocks/1

extract_text_blocks(Html::binary()) -> [content_block()]

Html: HTML content

returns: List of text content blocks

Extracts text blocks from various HTML elements.

find_port/0

find_port() -> {ok, port_number()} | {error, no_ports_available}

returns: {ok, Port} if an available port is found, or {error, no_ports_available} if no port is available

Finds an available TCP port for the filter service. Searches for a free port in the range 8081-9000.

format_aggregated_data/1

format_aggregated_data(Data::map()) -> map()

returns: Formatted output map

Formats aggregated data for output.

get_filter_port/1

get_filter_port(FilterName::atom()) -> {ok, port_number()} | {error, not_found}

FilterName: Name of the filter

returns: {ok, Port} if filter is running, or {error, not_found} if filter is not running

Gets the port number for a running filter.

get_text/1

get_text(Element) -> any()

merge_content_types/1

merge_content_types(ContentBlocks::[content_block()]) -> map()

ContentBlocks: List of content blocks

returns: Merged content structure

Merges different content types into a unified structure.

parse_string/1

parse_string(Html) -> any()

register_filter/1

register_filter(FilterUrl::filter_url()) -> {ok, registered} | {error, term()}

FilterUrl: URL of the filter service to register

returns: {ok, registered} if registration is successful, or {error, Reason} if registration fails

Registers a filter with the discovery service.

This function sends an HTTP POST request to the discovery service with the filter information in JSON format.

resolve_named_entity/1

resolve_named_entity(X1) -> any()

safe_binary_replace/3

safe_binary_replace(Subject, Pattern, Replacement) -> any()

should_skip_link/2

should_skip_link(Link, ExcludedContent) -> any()

start_filter/2

start_filter(FilterName::atom(), HandlerModule::module()) -> {ok, pid()} | {error, term()}

FilterName: Name of the filter (atom)
HandlerModule: Module to handle requests (module)

returns: {ok, Pid} if startup is successful, or {error, Reason} if startup fails

Starts a filter service with the given name and handler module.

stop_filter/1

stop_filter(FilterName::atom()) -> ok | {error, not_running}

FilterName: Name of the filter to stop (atom)

returns: ok if stopped successfully, or {error, not_running} if filter is not running

Stops a running filter service.


Generated by EDoc