em_filter
- Library for registering Emergence filters with data aggregation.
Authors: Steve Roques.
em_filter
- Library for registering Emergence filters with data aggregation
aggregated_content() = [content_block()]
content_block() = #{type => content_type(), data => term(), metadata => map(), position => integer()}
content_type() = text | link | image | video | audio | mixed
filter_url() = string()
port_number() = 1..65535
aggregate_data/2 | Aggregates data based on specified options. |
classify_content/1 | Classifies content type based on HTML element. |
clean_text/3 | |
decode_hex_entities/1 | |
decode_html_entities/1 | |
decode_named_entities/1 | |
decode_numeric_entities/1 | |
ensure_binary/1 | |
extract_attribute/2 | |
extract_content_blocks/1 | Extracts and classifies different content blocks from HTML. |
extract_elements/2 | |
extract_images/1 | Extracts image elements with metadata. |
extract_links_with_text/1 | Extracts links with associated text. |
extract_media_content/1 | Extracts media content (video, audio). |
extract_text_blocks/1 | Extracts text blocks from various HTML elements. |
find_port/0 | Finds an available TCP port for the filter service. |
format_aggregated_data/1 | Formats aggregated data for output. |
get_filter_port/1 | Gets the port number for a running filter. |
get_text/1 | |
merge_content_types/1 | Merges different content types into a unified structure. |
parse_string/1 | |
register_filter/1 | Registers a filter with the discovery service. |
resolve_named_entity/1 | |
safe_binary_replace/3 | |
should_skip_link/2 | |
start_filter/2 | Starts a filter service with the given name and handler module. |
stop_filter/1 | Stops a running filter service. |
aggregate_data(Html::binary(), Options::map()) -> aggregated_content()
Html
: HTML content
Options
: Aggregation options map
returns: Aggregated content
Aggregates data based on specified options.
classify_content(Element::binary()) -> content_type()
Element
: HTML element as binary
returns: Content type atom
Classifies content type based on HTML element.
clean_text(DescText0, IconText0, DtText0) -> any()
decode_hex_entities(Text) -> any()
decode_html_entities(Text) -> any()
decode_named_entities(Text) -> any()
decode_numeric_entities(Text) -> any()
ensure_binary(Text) -> any()
extract_attribute(Element, Attribute) -> any()
extract_content_blocks(Html::binary()) -> {ok, aggregated_content()} | {error, term()}
Html
: HTML content as binary
returns: {ok, AggregatedContent} or {error, Reason}
Extracts and classifies different content blocks from HTML.
extract_elements(Html, Selector) -> any()
extract_images(Html::binary()) -> [content_block()]
Html
: HTML content
returns: List of image content blocks
Extracts image elements with metadata.
extract_links_with_text(Html::binary()) -> [content_block()]
Html
: HTML content
returns: List of link content blocks
Extracts links with associated text.
extract_media_content(Html::binary()) -> [content_block()]
Html
: HTML content
returns: List of media content blocks
Extracts media content (video, audio).
extract_text_blocks(Html::binary()) -> [content_block()]
Html
: HTML content
returns: List of text content blocks
Extracts text blocks from various HTML elements.
find_port() -> {ok, port_number()} | {error, no_ports_available}
returns: {ok, Port} if an available port is found, or {error, no_ports_available} if no port is available
Finds an available TCP port for the filter service. Searches for a free port in the range 8081-9000.
format_aggregated_data(Data::map()) -> map()
returns: Formatted output map
Formats aggregated data for output.
get_filter_port(FilterName::atom()) -> {ok, port_number()} | {error, not_found}
FilterName
: Name of the filter
returns: {ok, Port} if filter is running, or {error, not_found} if filter is not running
Gets the port number for a running filter.
get_text(Element) -> any()
merge_content_types(ContentBlocks::[content_block()]) -> map()
ContentBlocks
: List of content blocks
returns: Merged content structure
Merges different content types into a unified structure.
parse_string(Html) -> any()
register_filter(FilterUrl::filter_url()) -> {ok, registered} | {error, term()}
FilterUrl
: URL of the filter service to register
returns: {ok, registered} if registration is successful, or {error, Reason} if registration fails
Registers a filter with the discovery service.
This function sends an HTTP POST request to the discovery service with the filter information in JSON format.resolve_named_entity(X1) -> any()
safe_binary_replace(Subject, Pattern, Replacement) -> any()
should_skip_link(Link, ExcludedContent) -> any()
start_filter(FilterName::atom(), HandlerModule::module()) -> {ok, pid()} | {error, term()}
FilterName
: Name of the filter (atom)
HandlerModule
: Module to handle requests (module)
returns: {ok, Pid} if startup is successful, or {error, Reason} if startup fails
Starts a filter service with the given name and handler module.
stop_filter(FilterName::atom()) -> ok | {error, not_running}
FilterName
: Name of the filter to stop (atom)
returns: ok if stopped successfully, or {error, not_running} if filter is not running
Stops a running filter service.
Generated by EDoc