A bug-for-bug compatible Elixir port of PHP's
filter_var($email, FILTER_VALIDATE_EMAIL).
This library answers exactly one question, the same way PHP does:
iex> ElixirPhpEmailValidator.valid?("a@b.c")
true
iex> ElixirPhpEmailValidator.valid?("a@b")
falseIt is not an RFC 5321/5322 validator, and it is intentionally not
"correct" in any abstract sense. Its only goal is to return the same
boolean that PHP's filter_var/2 returns for the same input — including
PHP's surprising behaviours (a bare IP like user@1.2.3.4 is rejected, but
a bracketed literal user@[1.2.3.4] is accepted; a bare space is illegal
even inside a quoted local part, but a backslash-escaped space is fine;
etc.). See the compatibility guide and the quirks catalog for the full list.
How it works (why it is faithful)
PHP implements FILTER_VALIDATE_EMAIL in C, in
ext/filter/logical_filters.c,
function php_filter_validate_email. The algorithm is:
- Reject the input if it is longer than 320 bytes (octets).
- Match it against a single, anchored PCRE regex with the inline flags
i(caseless) andD(dollar-end-only). There are two regexes: an ASCII one used by default, and a Unicode-aware one (adding\pL/\pNto the local part, plus theu/ PCRE_UTF8+PCRE_UCP flag) used only when theFILTER_FLAG_EMAIL_UNICODEflag is set. - Return the string on match,
falseotherwise.
This module vendors the exact PHP regex literals, byte-for-byte, in
priv/php/regexp1.full (default) and priv/php/regexp0.full (unicode), and
derives both the pattern and the :re options from them at compile time —
so the vendored literal is the single source of truth. The flag translation
is mechanical:
| PHP inline flag | Erlang :re option |
|---|---|
i (caseless) | :caseless |
D (PCRE_DOLLAR_ENDONLY) | :dollar_endonly |
u (PCRE_UTF8 + UCP) | :unicode, :ucp |
Both PHP (PCRE2) and Erlang/OTP's :re are PCRE-family engines, so the same
pattern produces the same matches. The library matches on the raw bytes
of the input (exactly as PHP's pcre2_match does), and — for the unicode
path — treats an input that is not valid UTF-8 as a non-match, which is what
PHP does when pcre2_match reports a UTF-8 error.
Faithfulness is proven, not asserted: the test suite generates golden
verdicts from real php binaries across a PHP version matrix and asserts
this module reproduces them exactly. See mix php.golden and the
COMPATIBILITY.md guide.
Performance and untrusted input
Validation is a single anchored PCRE match. For ordinary addresses it is
effectively instant, but the vendored pattern (PHP's own) contains nested
quantifiers in the domain sub-pattern, so a deliberately crafted input —
even a short one, well under the 320-byte gate — can drive PCRE into heavy
backtracking: a worst-case string costs on the order of ~200 ms inside
:re.run before the engine's internal step limit aborts the match. The
verdict is still correct (such inputs are rejected, exactly as PHP rejects
them); the cost is purely CPU time. Note that the 320-byte gate bounds input
length, not per-call CPU.
Two consequences worth knowing for a hot path that validates untrusted input at scale:
:re.runruns in a NIF that does not yield, so a pathological input pins a scheduler thread for the duration. If you validate attacker-controlled strings at high volume, run validation off the request-serving schedulers (e.g. aTasksupervised on a separate pool) and/or apply a timeout and rate-limiting upstream.- This library intentionally does not pass
:reamatch_limit. A limit low enough to cap the worst case can also reject a legitimate many-label address that PHP accepts — i.e. it would break exact parity, which is this library's entire contract. Bounding CPU is therefore left to the caller, where it can be done without sacrificing faithfulness.
Provenance / copyright
The vendored regex derives from Michael Rushton's email validator and is
redistributed by PHP under its own terms; Rushton's copyright notice is
preserved in NOTICE. source_info/0 reports the exact php-src ref and
checksums this build tracks.
Summary
Types
Options for valid?/2 and validate/2.
Functions
Returns provenance metadata for the vendored PHP regex: which php-src
ref/version this port tracks, the upstream file/function, and the SHA-256
checksums of the two vendored patterns. Read from priv/php/MANIFEST.json and
the vendored bytes at compile time, so it always matches this build.
Returns true when PHP's filter_var(email, FILTER_VALIDATE_EMAIL) would
accept email, and false otherwise.
Mirrors PHP's return convention: {:ok, email} when valid (PHP returns the
string), :error when invalid (PHP returns false).
Types
@type opts() :: [{:unicode, boolean()}]
Options for valid?/2 and validate/2.
Functions
@spec source_info() :: %{ source_file: String.t(), function: String.t(), php_ref: String.t(), upstream_url: String.t(), vendored_at: String.t(), max_length: non_neg_integer(), regexp1_sha256: String.t(), regexp0_sha256: String.t() }
Returns provenance metadata for the vendored PHP regex: which php-src
ref/version this port tracks, the upstream file/function, and the SHA-256
checksums of the two vendored patterns. Read from priv/php/MANIFEST.json and
the vendored bytes at compile time, so it always matches this build.
Example
iex> info = ElixirPhpEmailValidator.source_info()
iex> info.function
"php_filter_validate_email"
iex> byte_size(info.regexp1_sha256)
64
Returns true when PHP's filter_var(email, FILTER_VALIDATE_EMAIL) would
accept email, and false otherwise.
Options
:unicode(boolean, defaultfalse) — whentrue, mirrors PHP'sFILTER_FLAG_EMAIL_UNICODE, allowing Unicode letters/numbers in the local part (the domain still must be ASCII).
Input type
Pass a UTF-8 binary (an Elixir string). The @spec says binary(), so
Dialyzer and Elixir's compile-time type checker can flag a non-binary argument
before it ever runs. At runtime the function stays total — like PHP's
filter_var, which never raises — so any non-binary value simply returns
false. This is the one spot where the port deliberately does not imitate
PHP: PHP coerces scalars (filter_var(123, ...) validates "123"), whereas
this library does not coerce. In particular a charlist such as ~c"a@b.c"
returns false; convert it with to_string/1 first.
Examples
iex> ElixirPhpEmailValidator.valid?("first.last@iana.org")
true
iex> ElixirPhpEmailValidator.valid?("user@1.2.3.4")
false
iex> ElixirPhpEmailValidator.valid?("user@[1.2.3.4]")
true
iex> ElixirPhpEmailValidator.valid?("日本語@example.com")
false
iex> ElixirPhpEmailValidator.valid?("日本語@example.com", unicode: true)
true
iex> ElixirPhpEmailValidator.valid?(~c"a@b.c")
false
Mirrors PHP's return convention: {:ok, email} when valid (PHP returns the
string), :error when invalid (PHP returns false).
Like valid?/2, this expects a binary; see its "Input type" section.
Examples
iex> ElixirPhpEmailValidator.validate("a@b.c")
{:ok, "a@b.c"}
iex> ElixirPhpEmailValidator.validate("a@b")
:error