View Source JSV (jsv v0.2.0)

This is the main API for the JSV library.

Basic usage

The following snippet describes the general usage of the library in any context.

The rest of the documentation describes how to use JSV in the context of an application.

# 1. Define a schema
schema = %{
  type: :object,
  properties: %{
    name: %{type: :string}
  },
  required: [:name]
}

# 2. Define a resolver
resolver = {JSV.Resolver.BuiltIn, allowed_prefixes: ["https://json-schema.org/"]}

# 3. Build the schema
root = JSV.build!(schema, resolver: resolver)

# 4. Validate the data
case JSV.validate(%{"name" => "Alice"}, root) do
  {:ok, data} ->
    {:ok, data}

  {:error, validation_error} ->
    # Errors can be casted as JSON validator output to return them
    # to the producer of the invalid data
    {:error, JSON.encode!(JSV.normalize_error(validation_error))}
end

Core concepts

Input schema format

"Raw schemas" are schemas defined in Elixir data structures such as %{"type" => "integer"}.

JSV does not accept JSON strings. You will need to decode the JSON strings before giving them to the build function. There are three different possible formats for a schema:

  1. A boolean. Booleans are valid schemas that accept anything (when true) or reject everything (when false).

  2. A map with binary keys and values such as %{"type" => "integer"}.

  3. A map with atom keys and values such as %{type :integer}. The :__struct__ property of structs is safely ignored.

    The JSV.Schema struct can be used for autocompletion, but it does not provide any special behaviour over a raw map with atoms. The only difference is that any nil value found in the struct will be ignored. nil values in other maps or structs with atom keys are treated as-is (it's generally invalid).

    Atoms are converted to binaries internally so it is technically possible to mix atom with binaries in map keys, but the behaviour for duplicate keys is not defined: %{"type" => "string", :type => "integer"}.

Meta-schemas: Introduction to vocabularies

JSV was built in compliance with the vocabulary mechanism of JSON schema, to support custom and optional keywords in the schemas.

It may be more clear with an example:

  1. The well-known and official schema https://json-schema.org/draft/2020-12/schema defines the following vocabulary:

    {
      "$vocabulary": {
        "https://json-schema.org/draft/2020-12/vocab/core": true,
        "https://json-schema.org/draft/2020-12/vocab/applicator": true,
        "https://json-schema.org/draft/2020-12/vocab/unevaluated": true,
        "https://json-schema.org/draft/2020-12/vocab/validation": true,
        "https://json-schema.org/draft/2020-12/vocab/meta-data": true,
        "https://json-schema.org/draft/2020-12/vocab/format-annotation": true,
        "https://json-schema.org/draft/2020-12/vocab/content": true
      }
    }

    The vocabulary is split in different parts, here one by object property. More information can be found on the official website.

  2. Libraries such as JSV must map this vocabulary to implementations. For instance, in JSV, the https://json-schema.org/draft/2020-12/json-schema-validation part that defines the type keyword is implemented with the JSV.Vocabulary.V202012.Validation Elixir module.

  3. Finally, we can declare a schema that would like to use the type keyword. To let the library know what implementation to use for that keyword, the schema declares the https://json-schema.org/draft/2020-12/schema as its meta-schema (using the $schema keyword).

    {
      "$schema": "https://json-schema.org/draft/2020-12/schema",
      "type": "integer"
    }

    This tells the library to pull the vocabulary from the meta-schema and apply it to the schema.

  4. As JSV is compliant, it will use its implementation of https://json-schema.org/draft/2020-12/json-schema-validation to validate types.

    This also means that you can use a custom meta schema to skip some parts of the vocabulary, or add your own.

Resolvers overview

In order to build schemas properly, JSV needs to resolve the schema as a first step.

Resolving means fetching any remote resource whose data is needed and not available ; basically resources whose URIs are defined as $schema, $ref or $dynamicRef. Of course relative references do not need to be fetched.

Those URIs are generally URLs with the http:// or https:// scheme, but there are countless ways to fetch those URLs.

For security reasons, JSV does not provide a default resolver that would fetch those resources with an HTTP call. It is always required to provide the resolver option.

For convenience reasons, a built-in resolver (JSV.Resolver.BuiltIn) is provided but it still needs to be manually declared by users of the JSV library. That resolver will download given URLs from the web. Refer to the documentation of this module for more information.

Building schemas

In this chapter we will see how to build schemas from raw resources. The examples will mention the JSV.build/2 or JSV.build!/2 functions interchangeably. Everything described here applies to both.

Schemas are built according to their meta-schema vocabulary. JSV will assume that the $schema value is "https://json-schema.org/draft/2020-12/schema" by default if not provided.

Once built, a schema is converted into a JSV.Root, an internal representation of the schema that can be used to perform validation.

In a nutshell it boils down to the following:

root = JSV.build!(schema, resolver: resolver)

Built-in resolver

In order for that resolver to work, it must be able to decode JSON content from the web. To do so, you will need to provide a valid JSON implementation:

  • From Elixir 1.18, the JSON module is automatically available in the standard library.
  • JSV can use Jason if listed in your dependencies with the "~> 1.0" requirement.
  • JSV also supports Poison with the "~> 6.0 or ~> 5.0" requirement.

Custom resolvers

The built-in resolver is here to help for common use cases, but you may need to resolve resources from custom locations instead of just fetching the web URL.

JSV.build/2 accepts any resolver implementation as long as it implements the JSV.Resolver behaviour. See the documentation of that behaviour for more information.

It is also possible to delegate to that resolver. If for instance we support a my-company:// custom scheme, we could define the behaviour like so:

defmodule MyApp.SchemaResolver do
  alias JSV.Resolver.BuiltIn

  def resolve("my-company://" <> _ = url, _opts) do
    uri = URI.parse(url)
    schema_dir = "priv/schemas"
    # In this example the hostname in URL is ignored
    schema_path = Path.join(schema_dir, uri.path)
    json_schema = File.read!(schema_path)
    JSON.decode(json_schema)
  end

  def resolve("https://" <> _url, _opts) do
    JSV.Resolver.BuiltIn.resolve(url,
      allowed_prefixes: [
        "https://json-schema.org/",
        "https://some-friend-company/",
        "https://some-other-provider/"
      ]
    )
  end
end

It can be used as usual:

JSV.build(raw_schema, resolver: MyApp.SchemaResolver)

Enable or disable format validation

By default, the https://json-schema.org/draft/2020-12/schema meta schema does not perform format validation. This is very counter intuitive, but it basically means that the following code will return {:ok, "not a date"}:

schema =
  JSON.decode!("""
  {
    "type": "string",
    "format": "date"
  }
  """)

root = JSV.build!(schema, resolver: ...)

JSV.validate("not a date", root)

To always enable format validation when building a root schema, provide the formats: true option to JSV.build/2:

JSV.build(raw_schema, resolver: ..., formats: true)

This is another reason to wrap JSV.build with a custom builder module!

Note that format validation is determined at build time. There is no way to change whether it is performed once the root schema is built.

You can also enable format validation by using the JSON Schema specification semantics, though we strongly advise to just use the :formats option and call it a day.

For format validation to be enabled, a schema should declare the https://json-schema.org/draft/2020-12/vocab/format-assertion vocabulary instead of the https://json-schema.org/draft/2020-12/vocab/format-annotation one that is included by default in the https://json-schema.org/draft/2020-12/schema meta schema.

So, first we would declare a new meta schema:

{
    "$id": "custom://with-formats-on/",
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$vocabulary": {
        "https://json-schema.org/draft/2020-12/vocab/core": true,
        "https://json-schema.org/draft/2020-12/vocab/format-assertion": true
    },
    "$dynamicAnchor": "meta",
    "allOf": [
        { "$ref": "https://json-schema.org/draft/2020-12/meta/core" },
        { "$ref": "https://json-schema.org/draft/2020-12/meta/format-assertion" }
    ]
}

This example is taken from the JSON Schema Test Suite codebase and does not includes all the vocabularies, only the assertion for the formats and the core vocabulary.

Then we would declare our schema using that vocabulary to perform validation. Of course our resolver must be able to resolve the given URL for the new $schema property.

schema =
  JSON.decode!("""
  {
    "$schema": "custom://with-formats-on/",
    "type": "string",
    "format": "date"
  }
  """)

root = JSV.build!(schema, resolver: ...)

With this new meta-schema, JSV.validate/2 would return an error tuple without needing the formats: true.

{:error, _} = JSV.validate("hello", root)

In this case, it is also possible to disable the validation for schemas that use a meta-schema where the assertion vocabulary is declared:

JSV.build(raw_schema, resolver: ..., formats: false)

Custom build modules

With that in mind, we suggest to define a custom module to wrap the JSV.build/2 function, so the resolver, formats and vocabularies can be defined only once.

That module could be implemented like this:

defmodule MyApp.SchemaBuilder do
  def build_schema!(raw_schema) do
    JSV.build!(raw_schema, resolver: MyApp.SchemaResolver, formats: true)
  end
end

Compile-time builds

It is strongly encouraged to build schemas at compile time, in order to avoid repeating the build step for no good reason.

For instance, if we have this function that should validate external data:

# Do not do this

def validate_order(order) do
  root =
    "priv/schemas/order.schema.json"
    |> File.read!()
    |> JSON.decode!()
    |> MyApp.SchemaBuilder.build_schema!()

  case JSV.validate(order, root) do
    {:ok, _} -> OrderHandler.handle_order(order)
    {:error, _} = err -> err
  end
end

The schema will be built each time the function is called. Building a schema is actually pretty fast but it is a waste of resources nevertheless.

One could do the following to get a net performance gain:

# Do this instead

@order_schema "priv/schemas/order.schema.json"
              |> File.read!()
              |> JSON.decode!()
              |> MyApp.SchemaBuilder.build_schema!()

defp order_schema, do: @order_schema

def validate_order(order) do
  case JSV.validate(order, order_schema()) do
    {:ok, _} -> OrderHandler.handle_order(order)
    {:error, _} = err -> err
  end
end

You can also define a module where all your schemas are built and exported as functions:

defmodule MyApp.Schemas do
  schemas = [
    order: "tmp/order.schema.json",
    shipping: "tmp/shipping.schema.json"
  ]

  Enum.each(schemas, fn {fun, path} ->
    root =
      path
      |> File.read!()
      |> JSON.decode!()
      |> MyApp.SchemaBuilder.build_schema!()

    def unquote(fun)() do
      unquote(Macro.escape(root))
    end
  end)
end

...and use it elsewhere:

def validate_order(order) do
  case JSV.validate(order, MyApp.Schemas.order()) do
    {:ok, _} -> OrderHandler.handle_order(order)
    {:error, _} = err -> err
  end
end

Validation

To validate a term, call the JSV.validate/3 function like so:

JSV.validate(data, root_schema, opts)

General considerations

JSV supports all keywords of the 2020-12 specification except:

  • The return value of JSV.validate/3 returns casted data. See the documentation of that function for more information.

  • The contentMediaType, contentEncoding and contentSchema keywords. They are ignored. Future support for custom vocabularies will allow you to validate data with such keywords.

  • The format keyword is largely supported but with many inconsistencies, mostly due to differences between Elixir and JavaScript (JSON Schema is largely based on JavaScript primitives). For most use cases, the differences are negligible.

  • The "integer" type will transform floats into integer when the fractional part is zero (such as 123.0). Support for floating-point numbers with large integer parts is using native Elixir semantics and may return incorrect results:

    iex(1)> trunc(123456789123456789123456789.0)
    # returns:    123456789123456791337762816
    #                             |
    #                             | Difference starts here

    This is because 123456789123456789123456789.0 is decoded as 1.2345678912345679e26 from the shell as well as by the JSON module. This value should have 25 zeroes when truncated but that will not be the case.

    When dealing with such data it may be better to discard the casted data, or to work with strings instead of floats.

Formats

JSV supports multiple formats out of the box with its default implementation, but some are only available under certain conditions that will be specified for each format.

The following listing describes the condition for support and return value type for these default implementations. You can override those implementations by providing your own, as well as providing new formats. This will be described later in this document.

Also, note that by default, JSV format validation will return the original value, that is, the string form of the data. Some format validators can also cast the string to a more interesting data structure, for instance converting a date string to a Date struct. You can enable returning casted values by passing the cast_formats: true option to JSV.validate/3.

The listing below describe values returned with that option enabled.

Important: Some formats require the abnf_parsec library to be available. But we have numerous problems with this library, yielding false negatives with many inputs. An alternative solution will be implemented in future versions.

date

  • support: Native.
  • input: "2020-04-22"
  • output: ~D[2020-04-22]
  • The format is implemented with the native Date module.
  • The native Date module supports the YYYY-MM-DD format only. 2024, 2024-W50, 2024-12 will not be valid.

date-time

  • support: Native.
  • input: "2025-01-02T00:11:23.416689Z"
  • output: ~U[2025-01-02 00:11:23.416689Z]
  • The format is implemented with the native DateTime module.
  • The native DateTime module supports the YYYY-MM-DD format only for dates. 2024T..., 2024-W50T..., 2024-12T... will not be valid.
  • Decimal precision is not capped to milliseconds. 2024-12-14T23:10:00.500000001Z will be valid.

duration

  • support: Requires Elixir 1.17
  • input: "P1DT4,5S"
  • output: %Duration{day: 1, second: 4, microsecond: {500000, 1}}
  • Elixir documentation states that Only seconds may be specified with a decimal fraction, using either a comma or a full stop: P1DT4,5S.
  • Elixir durations accept negative values.
  • Elixir durations accept out-of-range values, for instance more than 59 minutes.
  • Excessive precision (as in "PT10.0000000000001S") will be valid.

email

  • support: Requires {:mail_address, "~> 1.0"}.
  • input: "hello@json-schema.org"
  • output: "hello@json-schema.org" (same value)
  • Support is limited by the implementation of that library.
  • The idn-email format is not supported out-of-the-box.

hostname

  • support: Native
  • input: "some-host"
  • output: "some-host" (same value)
  • Accepts numerical TLDs and single letter TLDs.

ipv4

  • support: Native
  • input: "127.0.0.1"
  • output: {127, 0, 0, 1}

ipv6

  • support: Native
  • input: "::1"
  • output: {0, 0, 0, 0, 0, 0, 0, 1}

iri

  • support: Requires {:abnf_parsec, "~> 1.0"}.
  • input: "https://héhé.com/héhé"
  • output: %URI{scheme: "https", authority: "héhé.com", userinfo: nil, host: "héhé.com", port: 443, path: "/héhé", query: nil, fragment: nil}

iri-reference

  • support: Requires {:abnf_parsec, "~> 1.0"}.
  • input: "//héhé"
  • output: %URI{scheme: nil, authority: "héhé", userinfo: nil, host: "héhé", port: nil, path: nil, query: nil, fragment: nil}

json-pointer

  • support: Requires {:abnf_parsec, "~> 1.0"}.
  • input: "/foo/bar/baz"
  • output: "/foo/bar/baz" (same value)

regex

  • support: Native
  • input: "[a-zA-Z0-9]"
  • output: ~r/[a-zA-Z0-9]/
  • The format is implemented with the native Regex module.
  • The Regex module does not follow the ECMA-262 specification.

relative-json-pointer

  • support: Requires {:abnf_parsec, "~> 1.0"}.
  • input: "0/foo/bar"
  • output: "0/foo/bar" (same value)

time

  • support: Native
  • input: "20:20:08.378586"
  • output: ~T[20:20:08.378586]
  • The format is implemented with the native Time module.
  • The native Time implementation will completely discard the time offset information. Invalid offsets will be valid.
  • Decimal precision is not capped to milliseconds. 23:10:00.500000001 will be valid.

unknown

  • support: Native
  • input: "anything"
  • output: "anything" (same value)
  • No validation or transformation is done.

uri

  • support: Native, optionally uses {:abnf_parsec, "~> 1.0"}.
  • input: "http://example.com"
  • output: %URI{scheme: "http", authority: "example.com", userinfo: nil, host: "example.com", port: 80, path: nil, query: nil, fragment: nil}
  • Without the optional dependency, the URI module is used and a minimum checks on hostname and scheme presence are made.

uri-reference

  • support: Native, optionally uses {:abnf_parsec, "~> 1.0"}.
  • input: "/example-path"
  • output: %URI{scheme: nil, userinfo: nil, host: nil, port: nil, path: "/example-path", query: nil, fragment: nil}
  • Without the optional dependency, the URI module will cast most non url-like strings as a path.

uri-template

  • support: Requires {:abnf_parsec, "~> 1.0"}.
  • input: "http://example.com/search{?query,lang}"
  • output: "http://example.com/search{?query,lang}" (same value)

uuid

  • support: Native
  • input: "bf22824c-c8a4-11ef-9642-0fdaf117eeb9"
  • output: "bf22824c-c8a4-11ef-9642-0fdaf117eeb9" (same value)

Custom formats

In order to provide custom formats, or to override default implementations for formats, you may provide a list of modules as the value for the :formats options of JSV.build/2. Such modules must implement the JSV.FormatValidator behaviour.

For instance:

defmodule CustomFormats do
  @behaviour JSV.FormatValidator

  @impl true
  def supported_formats do
    ["greeting"]
  end

  @impl true
  def validate_cast("greeting", data) do
    case data do
      "hello " <> name -> {:ok, %Greeting{name: name}}
      _ -> {:error, :invalid_greeting}
    end
  end
end

With this module you can now call the builder with it:

JSV.build!(raw_schema, resolver: ..., formats: [CustomFormats])

Note that this will disable all other formats. If you need to still support the default formats, a helper is available:

JSV.build!(raw_schema,
  resolver: ...,
  formats: [CustomFormats | JSV.default_format_validator_modules()]
)

Format validation modules are checked during the build phase, in order. So you can override any format defined by a module that comes later in the list, including the default modules.

Development

Contributing

Pull requests are welcome given appropriate tests and documentation.

Roadmap

  • Support for custom vocabularies
  • Declare a JSON codec module directly as built-in resolver option. This will be implemented if needed, we do not think there will be a strong demand for that.

Summary

Functions

Builds the schema as a JSV.Root schema for validation.

Same as build/2 but raises on error.

Returns the list of format validator modules that are used when a schema is built with format validation enabled and the :formats option to build/2 is true.

Returns the default meta schema used when the :default_meta option is not set in build/2.

Validate the data with the given schema. The schema must be a JSV.Root struct generated with build/2.

Functions

build(raw_schema, opts)

@spec build(
  JSV.Builder.raw_schema(),
  keyword()
) :: {:ok, JSV.Root.t()} | {:error, Exception.t()}

Builds the schema as a JSV.Root schema for validation.

Options

  • :resolver - Required. The JSV.Resolver behaviour implementation module to retrieve schemas identified by an URL.

    Accepts a module or a {module, options} tuple. The options can be any term and will be given to the resolve/2 callback of the module.

  • :default_meta (String.t/0) - The meta schema to use for resolved schemas that do not define a "$schema" property. The default value is "https://json-schema.org/draft/2020-12/schema".

  • :formats - Controls the validation of strings with the "format" keyword.

    • nil - Formats are validated according to the meta-schema vocabulary.
    • true - Enforces validation with the built-in validator modules.
    • false - Disables all format validation.
    • [Module1, Module2,...] – set those modules as validators. Disables the built-in format validator modules. The default validators can be included manually in the list, see default_format_validator_modules/0.

    The default value is nil.

build!(raw_schema, opts)

@spec build!(
  JSV.Builder.raw_schema(),
  keyword()
) :: JSV.Root.t()

Same as build/2 but raises on error.

default_format_validator_modules()

@spec default_format_validator_modules() :: [module()]

Returns the list of format validator modules that are used when a schema is built with format validation enabled and the :formats option to build/2 is true.

default_meta()

@spec default_meta() :: binary()

Returns the default meta schema used when the :default_meta option is not set in build/2.

Currently returns "https://json-schema.org/draft/2020-12/schema".

normalize_error(error)

@spec normalize_error(
  JSV.ValidationError.t()
  | JSV.Validator.context()
  | [JSV.Validator.Error.t()]
) ::
  map()

validate(data, root, opts \\ [])

@spec validate(term(), JSV.Root.t(), keyword()) ::
  {:ok, term()} | {:error, Exception.t()}

Validate the data with the given schema. The schema must be a JSV.Root struct generated with build/2.

Important: this function returns casted data:

  • If the :cast_formats option is enabled, string values may be transformed in other data structures. Refer to the "Formats" section of the JSV documentation for more information.
  • The JSON Schema specification states that 123.0 is a valid integer. This function will return 123 instead. This may return invalid data for floats with very large integer parts. As always when dealing with JSON and floats, use strings.
  • Future versions of the library will allow to cast raw data into Elixir structs.

Options

  • :cast_formats (boolean/0) - When enabled format validators will return casted values, for instance a Date struct instead of the date as string. It has no effect when the schema was not built with formats enabled. The default value is false.