glazer_json (glazer v0.5.0)

View Source

Fast JSON encoding and decoding using the glaze C++ library.

By default nulls are represented as the atom null. To change it application-wide, set the null env key in your config:

{glazer, [{null, nil}]}.

Features

  • Decoding straight to Erlang terms: maps, lists, binaries, integers (including bignums), floats, booleans, and null
  • Encoding Erlang terms straight to JSON, including big integers
  • Incremental/streaming decoding of partial input (e.g. NDJSON over a socket) via stream_decoder/0,1, stream_feed/2, stream_eof/1
  • Configurable representation of JSON null and JSON object keys
  • minify/1 and prettify/1 helpers
  • read_file/1,2 and write_file/2,3 helpers for decoding/encoding directly to/from a file
  • query/2,3: run a jq filter over a JSON document, returning decoded Erlang terms (requires glazer to be built with libjq available)

Summary

Functions

Decode a JSON binary or iolist to an Erlang term. JSON objects are returned as maps (default). Raises {parse_error, Msg} on invalid input.

Decode a JSON binary or iolist to an Erlang term with options. Raises {parse_error, Reason} on invalid input.

Encode an Erlang term to a JSON binary.

Encode an Erlang term to a JSON binary with options.

Minify a JSON binary or iolist, removing all unnecessary whitespace.

Pretty-print a JSON binary or iolist with two-space indentation.

Run a jq Filter program against a JSON binary or iolist Input, returning one Erlang term per value produced by the filter (in the order they are emitted by jq).

Like query/2, but decodes each result term using DecodeOpts (see decode/2).

Read Filename and decode its contents as JSON.

Read Filename and decode its contents as JSON, with decode options (see decode/2).

Locate the end of the next complete top-level JSON value in Bin, without decoding it.

Resume scanning Bin (the unconsumed remainder plus newly-appended bytes) from ScanState.

Create a new incremental decoder for feeding JSON in chunks (e.g. from a socket or file), useful when a complete document isn't available up front or when a stream contains a sequence of concatenated/whitespace-separated JSON values (e.g. newline-delimited JSON).

Create a new incremental decoder, passing Opts through to every decode/2 call.

Signal end-of-stream: decode any remaining buffered bytes as a final value (useful for a trailing bare scalar, e.g. a lone number or true/null, which the scanner can't otherwise distinguish from a value that's still being written to mid-chunk).

Feed a chunk of bytes into the decoder, returning any complete JSON values found so far (in order) along with the updated decoder.

Decode a JSON binary or iolist, returning {ok, Term} or {error, Reason} instead of raising.

Decode a JSON binary or iolist with options, returning {ok, Term} or {error, Reason} instead of raising.

Encode Data to JSON and write it to Filename, overwriting any existing file.

Encode Data to JSON with encode options (see encode/2) and write it to Filename, overwriting any existing file.

Types

decode_opt()

-type decode_opt() ::
          object_as_tuple | use_nil |
          {null_term, atom()} |
          {keys, atom | existing_atom | binary} |
          dedupe_keys.

decode_opts()

-type decode_opts() :: [decode_opt()].

Decode options:

  • object_as_tuple - decode JSON objects as {[{K, V}]} proplists rather than maps
  • use_nil - use the atom nil for JSON null
  • {null_term, Atom} - use Atom for JSON null
  • {keys, atom} - decode object keys as atoms
  • {keys, existing_atom} - decode keys as existing atoms, fall back to binary
  • {keys, binary} - decode keys as binaries (default)
  • dedupe_keys - with object_as_tuple, eliminate duplicate object keys from the resulting proplist, keeping the last occurrence's value (and position). Has no effect when objects are decoded as maps (the default) or with {keys, atom | existing_atom}: a JSON object with duplicate keys is always deduped (last value wins) when decoded to a map, since maps cannot represent duplicate keys.

encode_opt()

-type encode_opt() :: pretty | uescape | force_utf8 | use_nil | {null_term, atom()}.

encode_opts()

-type encode_opts() :: [encode_opt()].

Encode options:

  • pretty - pretty-print the JSON output
  • uescape - escape non-ASCII characters as \uXXXX sequences
  • force_utf8 - fix invalid UTF-8 sequences before encoding
  • use_nil - encode the atom nil as JSON null
  • {null_term, Atom} - encode Atom as JSON null

query_reason()

-type query_reason() ::
          enomem | jq_not_available | jq_decode_error |
          {jq_compile_error, binary()} |
          invalid_input |
          binary().

scan_state()

-type scan_state() :: tuple().

stream_decoder()

-opaque stream_decoder()

Functions

decode(Input)

-spec decode(binary() | iolist()) -> term().

Decode a JSON binary or iolist to an Erlang term. JSON objects are returned as maps (default). Raises {parse_error, Msg} on invalid input.

decode(Input, Opts)

-spec decode(binary() | iolist(), decode_opts()) -> term().

Decode a JSON binary or iolist to an Erlang term with options. Raises {parse_error, Reason} on invalid input.

encode(Data)

-spec encode(term()) -> binary().

Encode an Erlang term to a JSON binary.

encode(Data, Opts)

-spec encode(term(), encode_opts()) -> binary().

Encode an Erlang term to a JSON binary with options.

minify(Input)

-spec minify(binary() | iolist()) -> binary().

Minify a JSON binary or iolist, removing all unnecessary whitespace.

prettify(Input)

-spec prettify(binary() | iolist()) -> binary().

Pretty-print a JSON binary or iolist with two-space indentation.

query(Input, Filter)

-spec query(binary() | iolist(), binary() | iolist()) -> {ok, [term()]} | {error, query_reason()}.

Run a jq Filter program against a JSON binary or iolist Input, returning one Erlang term per value produced by the filter (in the order they are emitted by jq).

Requires glazer to have been built against libjq; if libjq was not available at build time, this returns {error, jq_not_available}.

A runtime error raised by the filter itself (e.g. via jq's error/0,1) is returned as {error, Msg} where Msg is the binary message produced by jq.

1> glazer_json:query(<<"{\\"a\\":[1,2,3]}">>, <<".a[]">>).
{ok,[1,2,3]}

2> glazer_json:query(<<"{\\"a\\":1}">>, <<".b">>).
{ok,[null]}

3> glazer_json:query(<<"not json">>, <<".">>).
{error, invalid_input}

query(Input, Filter, DecodeOpts)

-spec query(binary() | iolist(), binary() | iolist(), decode_opts()) ->
               {ok, [term()]} | {error, query_reason()}.

Like query/2, but decodes each result term using DecodeOpts (see decode/2).

read_file(Filename)

-spec read_file(file:name_all()) -> term().

Read Filename and decode its contents as JSON.

Raises {parse_error, Reason} if the file's contents aren't valid JSON, or a binary "Filename: Reason" message (see file:format_error/1) if the file can't be read.

Example

1> glazer_json:read_file("data.json").
#{<<"a">> => 1}

read_file(Filename, Opts)

-spec read_file(file:name_all(), decode_opts()) -> term().

Read Filename and decode its contents as JSON, with decode options (see decode/2).

scan(Bin)

-spec scan(binary() | iolist()) -> {complete, non_neg_integer()} | {incomplete, scan_state()}.

Locate the end of the next complete top-level JSON value in Bin, without decoding it.

Returns:

  • {complete, EndOffset} - a complete value spans binary:part(Bin, 0, EndOffset); the rest of Bin (if any) is left over for the next call
  • {incomplete, ScanState} - Bin doesn't yet contain a complete value; feed more data via scan/2 once it's available, passing the entire unconsumed remainder (this Bin, with new bytes appended) plus ScanState

This is the low-level primitive behind stream_feed/2; most callers should use the stream_* API instead.

Example

Slicing off complete values from a buffer of concatenated JSON:

1> Buf0 = <<"{\"a\":1} {\"b\":2}">>,
2> {complete, End1} = glazer_json:scan(Buf0).
{complete, 7}
3> <<Val1:End1/binary, Buf1/binary>> = Buf0,
4> Val1.
<<"{\"a\":1}">>
5> Buf1.
<<" {\"b\":2}">>
6> {complete, End2} = glazer_json:scan(Buf1).
{complete, 8}

Resuming a scan once more bytes arrive:

1> {incomplete, S0} = glazer_json:scan(<<"{\"a\":">>).
{incomplete, {6,1,false,false,true,false}}
2> glazer_json:scan(<<"{\"a\":1}">>, S0).
{complete, 7}

scan(Bin, ScanState)

-spec scan(binary() | iolist(), scan_state()) ->
              {complete, non_neg_integer()} | {incomplete, scan_state()}.

Resume scanning Bin (the unconsumed remainder plus newly-appended bytes) from ScanState.

stream_decoder()

-spec stream_decoder() -> stream_decoder().

Create a new incremental decoder for feeding JSON in chunks (e.g. from a socket or file), useful when a complete document isn't available up front or when a stream contains a sequence of concatenated/whitespace-separated JSON values (e.g. newline-delimited JSON).

Decoding itself is not incremental — each complete top-level value is still decoded in a single pass via decode/2 using the library's fast whole-buffer decoder. Only the boundary detection (finding where one value ends and the next begins) is incremental, via a small byte-scanner that tracks nesting/string state across chunks.

Example

1> D0 = glazer_json:stream_decoder(),
2> {Vals1, D1} = glazer_json:stream_feed(D0, <<"{\"a\":1} {\"b\":">>),
3> Vals1.
[#{<<"a">> => 1}]
4> {Vals2, _D2} = glazer_json:stream_feed(D1, <<"2}">>),
5> Vals2.
[#{<<"b">> => 2}]

stream_decoder(Opts)

-spec stream_decoder(decode_opts()) -> stream_decoder().

Create a new incremental decoder, passing Opts through to every decode/2 call.

stream_eof/1

-spec stream_eof(stream_decoder()) -> {ok, [term()]} | {error, term()}.

Signal end-of-stream: decode any remaining buffered bytes as a final value (useful for a trailing bare scalar, e.g. a lone number or true/null, which the scanner can't otherwise distinguish from a value that's still being written to mid-chunk).

Returns {ok, [Term]} with zero or one trailing value, or {error, Reason} if the remaining bytes don't form a complete value.

Example

1> D0 = glazer_json:stream_decoder(),
2> {Vals1, D1} = glazer_json:stream_feed(D0, <<"123">>),
3> Vals1.
[]
4> glazer_json:stream_eof(D1).
{ok, [123]}

A stream that ends mid-value (e.g. a dropped connection) yields an error instead of silently dropping the partial data:

1> D0 = glazer_json:stream_decoder(),
2> {Vals1, D1} = glazer_json:stream_feed(D0, <<"{\"a\":1, \"b\":">>),
3> Vals1.
[]
4> glazer_json:stream_eof(D1).
{error, _Reason}

stream_feed/2

-spec stream_feed(stream_decoder(), binary() | iolist()) -> {[term()], stream_decoder()}.

Feed a chunk of bytes into the decoder, returning any complete JSON values found so far (in order) along with the updated decoder.

Raises the same exceptions as decode/2 (e.g. Reason) if a value that the scanner deemed complete fails to decode.

Example

Call stream_feed/2 for each chunk received from the source while more data may still arrive, and stream_eof/1 once the source is exhausted to flush any trailing value:

loop(Socket, D0) ->
  case gen_tcp:recv(Socket, 0) of
    {ok, Chunk} ->
      {Vals, D1} = glazer_json:stream_feed(D0, Chunk),
      handle_values(Vals),
      loop(Socket, D1);
    {error, closed} ->
      case glazer_json:stream_eof(D0) of
        {ok, Trailing}  -> handle_values(Trailing);
        {error, Reason} -> handle_truncated_stream(Reason)
      end
  end.

The same decoder fits naturally into a gen_server driving an active-mode socket: keep the stream_decoder() in the process state, feed it from handle_info({tcp, ...}), and flush it on {tcp_closed, ...}:

-module(json_conn).
-behaviour(gen_server).
-export([start_link/1]).
-export([init/1, handle_call/3, handle_cast/2, handle_info/2]).

-record(state, {socket, decoder}).

start_link(Socket) ->
  gen_server:start_link(?MODULE, Socket, []).

init(Socket) ->
  inet:setopts(Socket, [{active, once}]),
  {ok, #state{socket = Socket, decoder = glazer_json:stream_decoder()}}.

handle_info({tcp, Socket, Data}, #state{socket = Socket, decoder = D0} = State) ->
  {Vals, D1} = glazer_json:stream_feed(D0, Data),
  lists:foreach(fun handle_value/1, Vals),
  inet:setopts(Socket, [{active, once}]),
  {noreply, State#state{decoder = D1}};

handle_info({tcp_closed, Socket}, #state{socket = Socket, decoder = D0} = State) ->
  case glazer_json:stream_eof(D0) of
    {ok, Trailing}  -> lists:foreach(fun handle_value/1, Trailing);
    {error, Reason} -> handle_truncated_stream(Reason)
  end,
  {stop, normal, State};

handle_info({tcp_error, Socket, Reason}, #state{socket = Socket} = State) ->
  {stop, Reason, State}.

handle_call(_Request, _From, State) -> {reply, ok, State}.
handle_cast(_Request, State)        -> {noreply, State}.

handle_value(Val) ->
  io:format("received: ~p~n", [Val]).

try_decode(Input)

-spec try_decode(binary() | iolist()) -> {ok, term()} | {error, binary()}.

Decode a JSON binary or iolist, returning {ok, Term} or {error, Reason} instead of raising.

try_decode(Input, Opts)

-spec try_decode(binary() | iolist(), decode_opts()) -> {ok, term()} | {error, binary()}.

Decode a JSON binary or iolist with options, returning {ok, Term} or {error, Reason} instead of raising.

write_file(Filename, Data)

-spec write_file(file:name_all(), term()) -> ok.

Encode Data to JSON and write it to Filename, overwriting any existing file.

Raises a binary "Filename: Reason" message (see file:format_error/1) if the file can't be written.

Example

1> glazer_json:write_file("data.json", #{<<"a">> => 1}).
ok

write_file(Filename, Data, Opts)

-spec write_file(file:name_all(), term(), encode_opts()) -> ok.

Encode Data to JSON with encode options (see encode/2) and write it to Filename, overwriting any existing file.