glazer (glazer v0.3.2)

View Source

Fast JSON encoding and decoding using the glaze C++ library.

By default JSON null is represented as the atom null. To change it application-wide, set the null env key in your config:

{glazer, [{null, nil}]}.

See also [https://github.com/stephenberry/glaze]

Summary

Functions

Decode a CSV binary or iolist to a list of rows.

Decode a CSV binary or iolist to a list of rows, with options. Raises Reason::atom() (unterminated_quoted_field or duplicate_header) on invalid input.

Encode a list of rows to a CSV binary.

Encode a list of rows to a CSV binary, with options.

Create a new incremental decoder for feeding CSV in chunks (e.g. from a socket or file), useful when the whole input isn't available up front.

Create a new incremental CSV decoder, passing Opts through to every csv_decode/2 call.

Signal end-of-stream: decode any remaining buffered bytes as a final row (useful when the input doesn't end with a trailing line break).

Feed a chunk of bytes into the decoder, returning any complete CSV rows found so far (in order) along with the updated decoder.

Decode a CSV binary or iolist, returning {ok, Rows} or {error, Reason} instead of raising, where Reason is unterminated_quoted_field or duplicate_header.

Decode a CSV binary or iolist with options, returning {ok, Rows} or {error, Reason} instead of raising, where Reason is unterminated_quoted_field or duplicate_header.

Decode a JSON number string to an integer. Raises invalid_number_format on invalid input.

Encode an integer to its JSON string representation. Raises badarg if Int is not an integer.

Decode a JSON binary or iolist to an Erlang term. JSON objects are returned as maps (default). Raises {parse_error, Msg} on invalid input.

Decode a JSON binary or iolist to an Erlang term with options. Raises {parse_error, Reason} on invalid input.

Encode an Erlang term to a JSON binary.

Encode an Erlang term to a JSON binary with options.

Minify a JSON binary or iolist, removing all unnecessary whitespace.

Pretty-print a JSON binary or iolist with two-space indentation.

Run a jq Filter program against a JSON binary or iolist Input, returning one Erlang term per value produced by the filter (in the order they are emitted by jq).

Like json_query/2, but decodes each result term using DecodeOpts (see json_decode/2).

Locate the end of the next complete top-level JSON value in Bin, without decoding it.

Resume scanning Bin (the unconsumed remainder plus newly-appended bytes) from ScanState.

Create a new incremental decoder for feeding JSON in chunks (e.g. from a socket or file), useful when a complete document isn't available up front or when a stream contains a sequence of concatenated/whitespace-separated JSON values (e.g. newline-delimited JSON).

Create a new incremental decoder, passing Opts through to every json_decode/2 call.

Signal end-of-stream: decode any remaining buffered bytes as a final value (useful for a trailing bare scalar, e.g. a lone number or true/null, which the scanner can't otherwise distinguish from a value that's still being written to mid-chunk).

Feed a chunk of bytes into the decoder, returning any complete JSON values found so far (in order) along with the updated decoder.

Decode a JSON binary or iolist, returning {ok, Term} or {error, Reason} instead of raising.

Decode a JSON binary or iolist with options, returning {ok, Term} or {error, Reason} instead of raising.

Decode a JSON number string to an integer, returning {ok, Int} or {error, invalid_number_format} instead of raising.

Decode a YAML binary or iolist to an Erlang term. YAML mappings are returned as maps (default). Raises {parse_error, Reason} on invalid input.

Decode a YAML binary or iolist to an Erlang term with options. Raises {parse_error, Msg} on invalid input.

Encode an Erlang term to a YAML binary in block style (2-space indentation, sequences at the same indentation as the mapping key that owns them).

Encode an Erlang term to a YAML binary in block style with options.

Decode a YAML binary or iolist, returning {ok, Term} or {error, Msg} instead of raising.

Decode a YAML binary or iolist with options, returning {ok, Term} or {error, Msg} instead of raising.

Types

csv_decode_opt()

-type csv_decode_opt() :: {delimiter, char()} | headers | {keys, atom | existing_atom | binary}.

csv_decode_opts()

-type csv_decode_opts() :: [csv_decode_opt()].

CSV decode options:

  • {delimiter, Char} - field delimiter (default $,)
  • headers - treat the first row as column names and decode each subsequent row as a map keyed by those names, instead of returning every row as a list of fields
  • {keys, atom} - with headers, decode column names as atoms
  • {keys, existing_atom} - with headers, decode column names as existing atoms, falling back to binaries for unknown atoms
  • {keys, binary} - with headers, decode column names as binaries (default)

csv_encode_opt()

-type csv_encode_opt() :: {delimiter, char()} | headers | {line_ending, lf | crlf}.

csv_encode_opts()

-type csv_encode_opts() :: [csv_encode_opt()].

CSV encode options:

  • {delimiter, Char} - field delimiter (default $,)
  • headers - input is a list of maps; the first map's keys become the header row, and subsequent maps are encoded as rows in that column order (missing keys produce empty fields)
  • {line_ending, lf | crlf} - line terminator (default crlf, per RFC 4180)

csv_stream_decoder()

-opaque csv_stream_decoder()

decode_opt()

-type decode_opt() ::
          object_as_tuple | use_nil |
          {null_term, atom()} |
          {keys, atom | existing_atom | binary} |
          dedupe_keys.

decode_opts()

-type decode_opts() :: [decode_opt()].

Decode options:

  • object_as_tuple - decode JSON objects as {[{K, V}]} proplists rather than maps
  • use_nil - use the atom nil for JSON null
  • {null_term, Atom} - use Atom for JSON null
  • {keys, atom} - decode object keys as atoms
  • {keys, existing_atom} - decode keys as existing atoms, fall back to binary
  • {keys, binary} - decode keys as binaries (default)
  • dedupe_keys - with object_as_tuple, eliminate duplicate object keys from the resulting proplist, keeping the last occurrence's value (and position). Has no effect when objects are decoded as maps (the default) or with {keys, atom | existing_atom}: a JSON object with duplicate keys is always deduped (last value wins) when decoded to a map, since maps cannot represent duplicate keys.

encode_opt()

-type encode_opt() :: pretty | uescape | force_utf8 | use_nil | {null_term, atom()}.

encode_opts()

-type encode_opts() :: [encode_opt()].

Encode options:

  • pretty - pretty-print the JSON output
  • uescape - escape non-ASCII characters as \uXXXX sequences
  • force_utf8 - fix invalid UTF-8 sequences before encoding
  • use_nil - encode the atom nil as JSON null
  • {null_term, Atom} - encode Atom as JSON null

json_query_reason()

-type json_query_reason() ::
          enomem | jq_not_available | jq_decode_error |
          {jq_compile_error, binary()} |
          invalid_input |
          binary().

json_stream_decoder()

-opaque json_stream_decoder()

scan_state()

-type scan_state() :: tuple().

yaml_decode_opt()

-type yaml_decode_opt() ::
          use_nil | {null_term, atom()} | {keys, atom | existing_atom | binary} | yaml_1_1_bools.

yaml_decode_opts()

-type yaml_decode_opts() :: [yaml_decode_opt()].

YAML decode options:

  • use_nil - use the atom nil for YAML null/~/empty values
  • {null_term, Atom} - use Atom for YAML null/~/empty values
  • {keys, atom} - decode mapping keys as atoms
  • {keys, existing_atom} - decode mapping keys as existing atoms, fall back to binary
  • {keys, binary} - decode mapping keys as binaries (default)
  • yaml_1_1_bools - additionally treat yes/no/on/off (and case variants) as booleans, per the YAML 1.1 core schema. By default (YAML 1.2 core schema) only true/false are recognized as booleans.

yaml_encode_opt()

-type yaml_encode_opt() :: use_nil | {null_term, atom()}.

yaml_encode_opts()

-type yaml_encode_opts() :: [yaml_encode_opt()].

YAML encode options:

  • use_nil - treat the atom nil as YAML null
  • {null_term, Atom} - treat Atom as YAML null

Functions

csv_decode(Input)

-spec csv_decode(binary() | iolist()) -> [[binary()]] | [#{binary() => binary()}].

Decode a CSV binary or iolist to a list of rows.

By default each row is a list of binary fields. With the headers option, the first row is used as column names and each subsequent row is decoded as a map. Raises unterminated_quoted_field or duplicate_header on invalid input.

csv_decode(Input, Opts)

-spec csv_decode(binary() | iolist(), csv_decode_opts()) -> [[binary()]] | [map()].

Decode a CSV binary or iolist to a list of rows, with options. Raises Reason::atom() (unterminated_quoted_field or duplicate_header) on invalid input.

csv_encode(Data)

-spec csv_encode([[term()]] | [map()]) -> binary().

Encode a list of rows to a CSV binary.

Each row is a list of fields (binaries, atoms, integers, or floats). Fields containing the delimiter, a double quote, or a line break are quoted per RFC 4180, with embedded quotes doubled.

csv_encode(Data, Opts)

-spec csv_encode([[term()]] | [map()], csv_encode_opts()) -> binary().

Encode a list of rows to a CSV binary, with options.

With the headers option, Data is a list of maps: the first map's keys become the header row (in iteration order), and each map is encoded as a row in that column order.

csv_stream_decoder()

-spec csv_stream_decoder() -> csv_stream_decoder().

Create a new incremental decoder for feeding CSV in chunks (e.g. from a socket or file), useful when the whole input isn't available up front.

Each complete row is decoded as soon as its terminating line break is seen, via csv_decode/2 on that single row. Only the row boundary detection is incremental — a small byte-scanner tracks whether the cursor is inside a quoted field across chunks, so that \n/\r\n inside quoted fields doesn't end a row.

With the headers option, the first complete row is captured as the header and used to decode every subsequent row as a map; no row is emitted for the header itself.

Example

1> D0 = glazer:csv_stream_decoder(),
2> {Rows1, D1} = glazer:csv_stream_feed(D0, <<"a,b\n1,2\n3,">>),
3> Rows1.
[[<<"a">>,<<"b">>],[<<"1">>,<<"2">>]]
4> {Rows2, D2} = glazer:csv_stream_feed(D1, <<"4\n">>),
5> Rows2.
[[<<"3">>,<<"4">>]]
6> glazer:csv_stream_eof(D2).
{ok, []}

csv_stream_decoder(Opts)

-spec csv_stream_decoder(csv_decode_opts()) -> csv_stream_decoder().

Create a new incremental CSV decoder, passing Opts through to every csv_decode/2 call.

csv_stream_eof/1

-spec csv_stream_eof(csv_stream_decoder()) -> {ok, [[binary()]] | [map()]} | {error, term()}.

Signal end-of-stream: decode any remaining buffered bytes as a final row (useful when the input doesn't end with a trailing line break).

Returns {ok, Rows} with zero or one trailing row, or {error, Reason} if the remaining bytes don't form a valid row.

Example

1> D0 = glazer:csv_stream_decoder(),
2> {Rows1, D1} = glazer:csv_stream_feed(D0, <<"a,b\n1,2">>),
3> Rows1.
[[<<"a">>,<<"b">>]]
4> glazer:csv_stream_eof(D1).
{ok, [[<<"1">>,<<"2">>]]}

csv_stream_feed/2

-spec csv_stream_feed(csv_stream_decoder(), binary() | iolist()) ->
                         {[[binary()]] | [map()], csv_stream_decoder()}.

Feed a chunk of bytes into the decoder, returning any complete CSV rows found so far (in order) along with the updated decoder.

Raises the same exceptions as csv_decode/2 if a row that the scanner deemed complete fails to decode.

Example

loop(Socket, D0) ->
  case gen_tcp:recv(Socket, 0) of
    {ok, Chunk} ->
      {Rows, D1} = glazer:csv_stream_feed(D0, Chunk),
      handle_rows(Rows),
      loop(Socket, D1);
    {error, closed} ->
      case glazer:csv_stream_eof(D0) of
        {ok, Trailing}  -> handle_rows(Trailing);
        {error, Reason} -> handle_truncated_stream(Reason)
      end
  end.

csv_try_decode(Input)

-spec csv_try_decode(binary() | iolist()) -> {ok, [[binary()]]} | {error, atom()}.

Decode a CSV binary or iolist, returning {ok, Rows} or {error, Reason} instead of raising, where Reason is unterminated_quoted_field or duplicate_header.

csv_try_decode(Input, Opts)

-spec csv_try_decode(binary() | iolist(), csv_decode_opts()) ->
                        {ok, [[binary()]] | [map()]} | {error, atom()}.

Decode a CSV binary or iolist with options, returning {ok, Rows} or {error, Reason} instead of raising, where Reason is unterminated_quoted_field or duplicate_header.

decode_integer(NumberString)

-spec decode_integer(binary() | iolist()) -> integer().

Decode a JSON number string to an integer. Raises invalid_number_format on invalid input.

encode_integer(Int)

-spec encode_integer(integer()) -> binary().

Encode an integer to its JSON string representation. Raises badarg if Int is not an integer.

json_decode(Input)

-spec json_decode(binary() | iolist()) -> term().

Decode a JSON binary or iolist to an Erlang term. JSON objects are returned as maps (default). Raises {parse_error, Msg} on invalid input.

json_decode(Input, Opts)

-spec json_decode(binary() | iolist(), decode_opts()) -> term().

Decode a JSON binary or iolist to an Erlang term with options. Raises {parse_error, Reason} on invalid input.

json_encode(Data)

-spec json_encode(term()) -> binary().

Encode an Erlang term to a JSON binary.

json_encode(Data, Opts)

-spec json_encode(term(), encode_opts()) -> binary().

Encode an Erlang term to a JSON binary with options.

json_minify(Input)

-spec json_minify(binary() | iolist()) -> binary().

Minify a JSON binary or iolist, removing all unnecessary whitespace.

json_prettify(Input)

-spec json_prettify(binary() | iolist()) -> binary().

Pretty-print a JSON binary or iolist with two-space indentation.

json_query(Input, Filter)

-spec json_query(binary() | iolist(), binary() | iolist()) ->
                    {ok, [term()]} | {error, json_query_reason()}.

Run a jq Filter program against a JSON binary or iolist Input, returning one Erlang term per value produced by the filter (in the order they are emitted by jq).

Requires glazer to have been built against libjq; if libjq was not available at build time, this returns {error, jq_not_available}.

A runtime error raised by the filter itself (e.g. via jq's error/0,1) is returned as {error, Msg} where Msg is the binary message produced by jq.

1> glazer:json_query(<<"{\\"a\\":[1,2,3]}">>, <<".a[]">>).
{ok,[1,2,3]}

2> glazer:json_query(<<"{\\"a\\":1}">>, <<".b">>).
{ok,[null]}

3> glazer:json_query(<<"not json">>, <<".">>).
{error, invalid_input}

json_query(Input, Filter, DecodeOpts)

-spec json_query(binary() | iolist(), binary() | iolist(), decode_opts()) ->
                    {ok, [term()]} | {error, json_query_reason()}.

Like json_query/2, but decodes each result term using DecodeOpts (see json_decode/2).

json_scan(Bin)

-spec json_scan(binary() | iolist()) -> {complete, non_neg_integer()} | {incomplete, scan_state()}.

Locate the end of the next complete top-level JSON value in Bin, without decoding it.

Returns:

  • {complete, EndOffset} - a complete value spans binary:part(Bin, 0, EndOffset); the rest of Bin (if any) is left over for the next call
  • {incomplete, ScanState} - Bin doesn't yet contain a complete value; feed more data via json_scan/2 once it's available, passing the entire unconsumed remainder (this Bin, with new bytes appended) plus ScanState

This is the low-level primitive behind json_stream_feed/2; most callers should use the stream_* API instead.

Example

Slicing off complete values from a buffer of concatenated JSON:

1> Buf0 = <<"{\"a\":1} {\"b\":2}">>,
2> {complete, End1} = glazer:json_scan(Buf0).
{complete, 7}
3> <<Val1:End1/binary, Buf1/binary>> = Buf0,
4> Val1.
<<"{\"a\":1}">>
5> Buf1.
<<" {\"b\":2}">>
6> {complete, End2} = glazer:json_scan(Buf1).
{complete, 8}

Resuming a scan once more bytes arrive:

1> {incomplete, S0} = glazer:json_scan(<<"{\"a\":">>).
{incomplete, {6,1,false,false,true,false}}
2> glazer:json_scan(<<"{\"a\":1}">>, S0).
{complete, 7}

json_scan(Bin, ScanState)

-spec json_scan(binary() | iolist(), scan_state()) ->
                   {complete, non_neg_integer()} | {incomplete, scan_state()}.

Resume scanning Bin (the unconsumed remainder plus newly-appended bytes) from ScanState.

json_stream_decoder()

-spec json_stream_decoder() -> json_stream_decoder().

Create a new incremental decoder for feeding JSON in chunks (e.g. from a socket or file), useful when a complete document isn't available up front or when a stream contains a sequence of concatenated/whitespace-separated JSON values (e.g. newline-delimited JSON).

Decoding itself is not incremental — each complete top-level value is still decoded in a single pass via json_decode/2 using the library's fast whole-buffer decoder. Only the boundary detection (finding where one value ends and the next begins) is incremental, via a small byte-scanner that tracks nesting/string state across chunks.

Example

1> D0 = glazer:json_stream_decoder(),
2> {Vals1, D1} = glazer:json_stream_feed(D0, <<"{\"a\":1} {\"b\":">>),
3> Vals1.
[#{<<"a">> => 1}]
4> {Vals2, _D2} = glazer:json_stream_feed(D1, <<"2}">>),
5> Vals2.
[#{<<"b">> => 2}]

json_stream_decoder(Opts)

-spec json_stream_decoder(decode_opts()) -> json_stream_decoder().

Create a new incremental decoder, passing Opts through to every json_decode/2 call.

json_stream_eof/1

-spec json_stream_eof(json_stream_decoder()) -> {ok, [term()]} | {error, term()}.

Signal end-of-stream: decode any remaining buffered bytes as a final value (useful for a trailing bare scalar, e.g. a lone number or true/null, which the scanner can't otherwise distinguish from a value that's still being written to mid-chunk).

Returns {ok, [Term]} with zero or one trailing value, or {error, Reason} if the remaining bytes don't form a complete value.

Example

1> D0 = glazer:json_stream_decoder(),
2> {Vals1, D1} = glazer:json_stream_feed(D0, <<"123">>),
3> Vals1.
[]
4> glazer:json_stream_eof(D1).
{ok, [123]}

A stream that ends mid-value (e.g. a dropped connection) yields an error instead of silently dropping the partial data:

1> D0 = glazer:json_stream_decoder(),
2> {Vals1, D1} = glazer:json_stream_feed(D0, <<"{\"a\":1, \"b\":">>),
3> Vals1.
[]
4> glazer:json_stream_eof(D1).
{error, _Reason}

json_stream_feed/2

-spec json_stream_feed(json_stream_decoder(), binary() | iolist()) -> {[term()], json_stream_decoder()}.

Feed a chunk of bytes into the decoder, returning any complete JSON values found so far (in order) along with the updated decoder.

Raises the same exceptions as json_decode/2 (e.g. Reason) if a value that the scanner deemed complete fails to decode.

Example

Call json_stream_feed/2 for each chunk received from the source while more data may still arrive, and json_stream_eof/1 once the source is exhausted to flush any trailing value:

loop(Socket, D0) ->
  case gen_tcp:recv(Socket, 0) of
    {ok, Chunk} ->
      {Vals, D1} = glazer:json_stream_feed(D0, Chunk),
      handle_values(Vals),
      loop(Socket, D1);
    {error, closed} ->
      case glazer:json_stream_eof(D0) of
        {ok, Trailing}  -> handle_values(Trailing);
        {error, Reason} -> handle_truncated_stream(Reason)
      end
  end.

The same decoder fits naturally into a gen_server driving an active-mode socket: keep the json_stream_decoder() in the process state, feed it from handle_info({tcp, ...}), and flush it on {tcp_closed, ...}:

-module(json_conn).
-behaviour(gen_server).
-export([start_link/1]).
-export([init/1, handle_call/3, handle_cast/2, handle_info/2]).

-record(state, {socket, decoder}).

start_link(Socket) ->
  gen_server:start_link(?MODULE, Socket, []).

init(Socket) ->
  inet:setopts(Socket, [{active, once}]),
  {ok, #state{socket = Socket, decoder = glazer:json_stream_decoder()}}.

handle_info({tcp, Socket, Data}, #state{socket = Socket, decoder = D0} = State) ->
  {Vals, D1} = glazer:json_stream_feed(D0, Data),
  lists:foreach(fun handle_value/1, Vals),
  inet:setopts(Socket, [{active, once}]),
  {noreply, State#state{decoder = D1}};

handle_info({tcp_closed, Socket}, #state{socket = Socket, decoder = D0} = State) ->
  case glazer:json_stream_eof(D0) of
    {ok, Trailing}  -> lists:foreach(fun handle_value/1, Trailing);
    {error, Reason} -> handle_truncated_stream(Reason)
  end,
  {stop, normal, State};

handle_info({tcp_error, Socket, Reason}, #state{socket = Socket} = State) ->
  {stop, Reason, State}.

handle_call(_Request, _From, State) -> {reply, ok, State}.
handle_cast(_Request, State)        -> {noreply, State}.

handle_value(Val) ->
  io:format("received: ~p~n", [Val]).

json_try_decode(Input)

-spec json_try_decode(binary() | iolist()) -> {ok, term()} | {error, binary()}.

Decode a JSON binary or iolist, returning {ok, Term} or {error, Reason} instead of raising.

json_try_decode(Input, Opts)

-spec json_try_decode(binary() | iolist(), decode_opts()) -> {ok, term()} | {error, binary()}.

Decode a JSON binary or iolist with options, returning {ok, Term} or {error, Reason} instead of raising.

try_decode_integer(NumberString)

-spec try_decode_integer(binary() | iolist()) -> {ok, integer()} | {error, invalid_number_format}.

Decode a JSON number string to an integer, returning {ok, Int} or {error, invalid_number_format} instead of raising.

yaml_decode(Input)

-spec yaml_decode(binary() | iolist()) -> term().

Decode a YAML binary or iolist to an Erlang term. YAML mappings are returned as maps (default). Raises {parse_error, Reason} on invalid input.

yaml_decode(Input, Opts)

-spec yaml_decode(binary() | iolist(), yaml_decode_opts()) -> term().

Decode a YAML binary or iolist to an Erlang term with options. Raises {parse_error, Msg} on invalid input.

yaml_encode(Data)

-spec yaml_encode(term()) -> binary().

Encode an Erlang term to a YAML binary in block style (2-space indentation, sequences at the same indentation as the mapping key that owns them).

yaml_encode(Data, Opts)

-spec yaml_encode(term(), yaml_encode_opts()) -> binary().

Encode an Erlang term to a YAML binary in block style with options.

yaml_try_decode(Input)

-spec yaml_try_decode(binary() | iolist()) -> {ok, term()} | {error, binary()}.

Decode a YAML binary or iolist, returning {ok, Term} or {error, Msg} instead of raising.

yaml_try_decode(Input, Opts)

-spec yaml_try_decode(binary() | iolist(), yaml_decode_opts()) -> {ok, term()} | {error, binary()}.

Decode a YAML binary or iolist with options, returning {ok, Term} or {error, Msg} instead of raising.