Legion.Sandbox.ASTChecker (Legion v0.4.0)

View Source

Static safety check for sandboxed code.

Parses a code string into an AST and walks every node, rejecting the first violation found. The check is default-deny for built-in modules: every module/function call must be explicitly listed.

Categories validated:

  • Bare forms - non-prefixed calls/identifiers (if, case, +, fn, is_atom, raise, ...) must be in @allowed_bare_forms. Anything else (def, defstruct, apply, spawn, send, receive, __MODULE__, ...) is rejected.
  • Module.function/arity calls - the module must be a built-in (with the function in that module's allowlist, and the arity within its cap) or a caller-supplied tool. Tail aliases match for __aliases__ nodes only, never for atom-literal dispatch (:"Elixir.Helper".fun(...)), so a tool tail like System cannot unlock the real stdlib module.
  • Arity caps on calendar functions - the higher arities of Date.new, DateTime.shift_zone, Time.utc_now, etc. take a calendar / time-zone database module that is dispatched at runtime. The allowlist caps each such function at its safe arity; sandboxed code must use the lower-arity form or a sigil (~D, ~T, ~U, ~N).
  • Dynamic dispatch - expr.fun / expr.fun(...) is rejected outright. Literal-module forms (Mod.fun(...) and :mod.fun(...)) are handled by the preceding clauses, so anything reaching the dynamic-dispatch clause has a non-literal base. m.key map access goes through this check too — use Map.fetch!(m, :key) or Map.get(m, :key).
  • raise / reraise - the first argument must be a literal alias / atom-literal exception module (in @safe_exceptions), a binary literal, or a string-interpolation. Anything else would force-load an arbitrary module at runtime via mod.exception/1.
  • Struct literals - %Mod{...} is rejected unless Mod is in @safe_struct_modules, @safe_exceptions, or a tool. The runtime force-loads Mod and runs its @on_load.
  • Fake structs - the literal atom :__struct__ and any binary containing the substring "__struct__" are rejected. A map with an __struct__ key is dispatched as a struct by every BEAM protocol; the binary check covers the ~w(... __struct__)a materialisation path.

Built-in modules and their allowed functions are defined in this file. Functions that can break out of the sandbox (atom-table churn, dynamic code evaluation, code definition, process / message manipulation, time-zone DB mutation, ...) are simply not in any allowlist.

Tools (caller-supplied modules)

Caller-supplied tool modules are fully trusted: every function on every arity is allowed, and %Tool{...} struct literals are allowed. The host is responsible for vetting whatever it exposes — if you hand File, System, Code, or any other stdlib module to Legion.Sandbox.execute/4 as a "tool", you have opted out of the sandbox for that module. Expose a thin facade module that wraps only the operations you actually want available.

Tail-alias collisions are not rejected. A tool named MyApp.Date shadows stdlib Date after the host prepends alias MyApp.Date: source-level Date.utc_today() then routes to the tool, and raise ArgumentError, "x" with a tool named MyApp.ArgumentError routes to the tool's exception/1 instead of stdlib. This is not an RCE escalation (the tool's functions are callable directly anyway), but it can produce surprising semantics — prefer non-colliding names for tools.

Known limitations

The check focuses on RCE prevention. It does not protect against:

  • Atom-table growth from parsing the code itself: Code.string_to_quoted/1 creates atoms for every identifier (variable name, function name, module alias, atom literal) it sees in the source. A pathological input with many unique identifiers will inflate the atom table even when every call site is denied.
  • Most denial-of-service vectors (CPU, memory, message queue depth, long-running comprehensions, ...). Wallclock timeouts in Legion.Sandbox.execute/4 are the only mitigation.

Summary

Functions

Validates code_string against the safety rules.

Functions

check(code_string, allowed_modules)

Validates code_string against the safety rules.

allowed_modules are caller-supplied tool modules. They are trusted: any function on them may be called. Both fully-qualified names (MyApp.Helper) and their tail aliases (Helper) are recognised, so code written without aliases will still pass validation before Legion.Sandbox prepends them.

Must be called before alias prepending - alias itself is rejected.

Returns :ok or {:error, reason} on the first violation (or parse error).