Rindle.AV.MetadataSanitizer (Rindle v0.1.5)

Copy Markdown View Source

Container-metadata sanitization for untrusted FFprobe output.

Two passes applied to every string value in a metadata map:

  1. Strip control characters in - except .
  2. Truncate to 1024 bytes (codepoint-aligned; no invalid UTF-8 emitted).

This is layered ON TOP of Rindle.AV.Ffprobe's HTML-escape (Phase 23). Both layers are intentional — Phase 23's escape is render-time defense in depth (output safety), Phase 24's truncate-and-strip is ingest-time stored-data hygiene (input safety). Do NOT collapse them. (D-21)

Called from Rindle.Probe.AVProbe (Plan 05) AFTER Rindle.AV.Ffprobe.probe/1 and BEFORE the result is written into media_assets.metadata. (D-20)

Implementation note: the standard byte-slice helper arrived in Elixir 1.17, while the CI matrix includes Elixir 1.15. The hand-rolled binary-size + String.valid?/1 rewind is the portable equivalent.

Summary

Functions

Truncates string to at most max_bytes bytes, never emitting an incomplete UTF-8 codepoint. Works on Elixir 1.15+.

Functions

sanitize(value)

@spec sanitize(map() | list() | binary() | term()) ::
  map() | list() | binary() | term()

truncate_to_bytes(string, max_bytes)

@spec truncate_to_bytes(String.t(), non_neg_integer()) :: String.t()

Truncates string to at most max_bytes bytes, never emitting an incomplete UTF-8 codepoint. Works on Elixir 1.15+.

Examples

iex> Rindle.AV.MetadataSanitizer.truncate_to_bytes("héllo", 1024)
"héllo"

iex> Rindle.AV.MetadataSanitizer.truncate_to_bytes("héllo", 3)
"hé"

iex> Rindle.AV.MetadataSanitizer.truncate_to_bytes("hello", 5)
"hello"