Derive ops — sanitize(...) and validate(...) mini-language

Copy Markdown View Source

derives: accepts a string parsed at compile time into a normalized op map. Typos surface as Spark.Error.DslError. The runtime never re-parses.

field :email, :string,
  derives: "sanitize(trim, downcase) validate(string, not_empty, email_r, max_len=320)"

Grammar

<derive>   ::= <group>+
<group>    ::= "sanitize(" <ops> ")"  |  "validate(" <ops> ")"
<ops>      ::= <op> ("," <op>)*
<op>       ::= <atom> | <atom> "=" <operand>
<operand>  ::= literal | "Type[...]" | "Map::..."

The same logical rules also support a keyword/block/pipe form (see source-level docs), but the string form above is the canonical input.

Built-in sanitize ops (GuardedStruct.Derive.Registry.@sanitize_ops)

OpEffect
:trimString.trim/1 if binary; passthrough otherwise.
:upcase / :downcase / :capitalizeCorresponding String.* calls.
:strip_tagsHtmlSanitizeEx.strip_tags/1 (optional dep).
:basic_html / :html5 / :markdown_htmlWhitelisted HTML cleanup (optional dep).
{:tag, op_atom}trim → op → trim.
:string_float / :string_integerParse numeric out of a string; returns 0/0.0 on failure.
:squishCollapse runs of whitespace into a single space + trim. Binaries only.
:no_controlStrip ASCII control characters (\x00\x1F, \x7F). Binaries only.
:no_zero_widthStrip zero-width unicode (U+200B, U+200C, U+200D, U+FEFF, U+2060).
:uniqEnum.uniq/1 on lists; passthrough otherwise.
:compactDrop nil entries from a list.
:reject_emptyDrop nil, "", [], %{} entries from a list.
:sortEnum.sort/1 on lists.
{:clamp, [min, max]}Clamp a number to min..max. Numbers only; passthrough otherwise.
{:default_when_nil, value}Replace nil with value.
{:default_when_empty, value}Replace nil / "" / [] / %{} with value.
{:each, [ops]}Apply inner sanitize ops to every element of a list.

Arg order is pipe-friendly: SanitizerDerive.sanitize(value, :op).

Built-in validate ops (GuardedStruct.Derive.Registry.@validate_ops)

Type guards: :string, :integer, :float, :number, :list, :map, :tuple, :atom, :boolean, :bitstring, :struct, :exception, :function, :pid, :port, :reference, :nil_value, :not_nil_value.

Content / format:

OpConstraint
:not_empty, :not_empty_stringNon-zero length.
:not_flatten_empty, :not_flatten_empty_itemList-shape contracts.
{:min_len, n} / {:max_len, n}Bounds. Apply to strings, integers, floats, ranges, lists.
:email, :email_rDNS-checked vs regex-only. email_r is data-layer safe.
:url, :tell, :geo_urlURL/phone/geo via URL/ExPhoneNumber (optional).
:uuid, :ipv4, :datetime, :date, :range, :regexFormat checks.
:username, :full_name, :location, :queue, :string_booleanDomain checks (see lib/guarded_struct/helper/extra.ex).
:slug^[a-z0-9]+(-[a-z0-9]+)*$ — kebab-case URL slug.
:hostnameRFC 1123 hostname via Erlang's :inet_parse.domain/1 (case-insensitive, ≤ 253 chars, no underscores, no scheme).
:port_numberInteger in 1..65535.
:hex_color#RGB or #RRGGBB form.
:semverSemVer 2.0 via Version.parse/1 — rejects leading zeros and other spec-violating shapes.
{:enum, "String[a::b::c]"} etc.Membership against compile-evaluated list.
{:equal, _}Equality.
{:either, _}, {:custom, _}Composition / user-supplied predicate.
{:optional, _}Wrap any inner op; nil passes, non-nil runs the inner ops.
{:each, [ops]}Apply inner validate ops to every element of a list; error message includes failing indices.
:recordErlang record shape.

Op flow

  1. Parse derives: string at compile time → __derive_ops__: %{sanitize: [...], validate: [...]}.
  2. Pre-evaluate operands like enum=String[a::b::c]{:enum, ["a", "b", "c"]} at compile time.
  3. Runtime applies ops in declared order: sanitize first, then validate.
  4. Errors emerge as a flat list of %{field, action, message} maps.

Five accepted derive syntaxes

  • String form (canonical, above).
  • @derives decorator (set on the next entity).
  • Keyword form: derive: [sanitize: [:trim], validate: [:string]].
  • Block form inside the entity.
  • Pipe form via GuardedStruct.Sanitize / GuardedStruct.Validate helpers.

All five normalize to the same internal op map.

Combinator patterns

field :allowed_origins, {:array, :string},
  derives: "sanitize(each=[trim, downcase], reject_empty, uniq) validate(list, max_len=20, each=[string, hostname])"

field :frontend_domain, :string,
  derives: "sanitize(trim, downcase) validate(optional=[string, max_len=200, hostname])"

field :priority, :integer,
  derives: "sanitize(default_when_nil=0, clamp=[0, 100])"

field :brand_color, :string,
  derives: "sanitize(trim, squish) validate(string, hex_color)"

field :api_port, :integer, derives: "validate(port_number)"

When combining each=[...] with max_len=N on the same list, declare max_len before each so the size check runs first and bounds the work each does.

Regex — three ways, ranked by "do anything weird"

regex=… in the canonical string form is convenient but has to be scanned out of the surrounding derive grammar. The parser tracks balanced [], (), {} and skips escaped chars, so the vast majority of real-world patterns "just work" unquoted:

# all of these parse cleanly unquoted
derives: "validate(regex=^[a-z0-9-]+$)"
derives: "validate(regex=^[A-Z]{2,5}$)"                # comma inside {n,m} is balanced
derives: "validate(regex=^https?://[a-z.-]+(:[0-9]+)?(/.*)?$)"  # nested ()s balanced
derives: "validate(regex=^(?=.*[A-Z])(?=.*\\d).{8,}$)"          # lookaheads + escapes
derives: "validate(each=[regex=^[a-z0-9.-]+$])"                 # nested in each=

For the edge cases the scanner can't disambiguate — a literal ,, ), or ] outside any balanced group inside the pattern — quote the pattern:

derives: ~S|validate(regex="^a,b$")|         # literal comma at top level
derives: ~S|validate(regex="^a]b$")|         # literal ] at top level

Code.string_to_quoted treats the inside of "…" as opaque. The scanner sees the " and skips entirely, so anything between the quotes survives as-is. Use the ~S|…| sigil so you don't have to escape the internal ".

If you'd rather keep regex out of the derive string altogether, pre-compile and validate with GuardedStruct.Validate.run/2 from your validator: {Mod, :fn} MFA — the parser is bypassed entirely.

Direct API

GuardedStruct.Derive.SanitizerDerive.sanitize("  Hello  ", :trim)        # => "Hello"
"  Hello  " |> SanitizerDerive.sanitize(:trim) |> SanitizerDerive.sanitize(:downcase)
GuardedStruct.Derive.ValidationDerive.call({:email, "a@b"}, [:email_r], [])  # {processed, errors}
GuardedStruct.Validate.run("validate(uuid)", "11111111-2222-3333-4444-555555555555")