glazer (glazer v0.3.0)

View Source

Fast JSON encoding and decoding using the glaze C++ library.

By default JSON null is represented as the atom null. To change it application-wide, set the null env key in your config:

{glazer, [{null, nil}]}.

See also [https://github.com/stephenberry/glaze]

Summary

Functions

Decode a CSV binary or iolist to a list of rows.

Decode a CSV binary or iolist to a list of rows, with options. Raises Reason::atom() (unterminated_quoted_field or duplicate_header) on invalid input.

Encode a list of rows to a CSV binary.

Encode a list of rows to a CSV binary, with options.

Create a new incremental decoder for feeding CSV in chunks (e.g. from a socket or file), useful when the whole input isn't available up front.

Create a new incremental CSV decoder, passing Opts through to every csv_decode/2 call.

Signal end-of-stream: decode any remaining buffered bytes as a final row (useful when the input doesn't end with a trailing line break).

Feed a chunk of bytes into the decoder, returning any complete CSV rows found so far (in order) along with the updated decoder.

Decode a CSV binary or iolist, returning {ok, Rows} or {error, Reason} instead of raising, where Reason is unterminated_quoted_field or duplicate_header.

Decode a CSV binary or iolist with options, returning {ok, Rows} or {error, Reason} instead of raising, where Reason is unterminated_quoted_field or duplicate_header.

Decode a JSON number string to an integer. Raises invalid_number_format on invalid input.

Encode an integer to its JSON string representation. Raises badarg if Int is not an integer.

Decode a JSON binary or iolist to an Erlang term. JSON objects are returned as maps (default). Raises {parse_error, Msg} on invalid input.

Decode a JSON binary or iolist to an Erlang term with options. Raises {parse_error, Reason} on invalid input.

Encode an Erlang term to a JSON binary.

Encode an Erlang term to a JSON binary with options.

Minify a JSON binary or iolist, removing all unnecessary whitespace.

Pretty-print a JSON binary or iolist with two-space indentation.

Locate the end of the next complete top-level JSON value in Bin, without decoding it.

Resume scanning Bin (the unconsumed remainder plus newly-appended bytes) from ScanState.

Create a new incremental decoder for feeding JSON in chunks (e.g. from a socket or file), useful when a complete document isn't available up front or when a stream contains a sequence of concatenated/whitespace-separated JSON values (e.g. newline-delimited JSON).

Create a new incremental decoder, passing Opts through to every json_decode/2 call.

Signal end-of-stream: decode any remaining buffered bytes as a final value (useful for a trailing bare scalar, e.g. a lone number or true/null, which the scanner can't otherwise distinguish from a value that's still being written to mid-chunk).

Feed a chunk of bytes into the decoder, returning any complete JSON values found so far (in order) along with the updated decoder.

Decode a JSON binary or iolist, returning {ok, Term} or {error, Reason} instead of raising.

Decode a JSON binary or iolist with options, returning {ok, Term} or {error, Reason} instead of raising.

Decode a JSON number string to an integer, returning {ok, Int} or {error, invalid_number_format} instead of raising.

Decode a YAML binary or iolist to an Erlang term. YAML mappings are returned as maps (default). Raises {parse_error, Reason} on invalid input.

Decode a YAML binary or iolist to an Erlang term with options. Raises {parse_error, Msg} on invalid input.

Encode an Erlang term to a YAML binary in block style (2-space indentation, sequences at the same indentation as the mapping key that owns them).

Encode an Erlang term to a YAML binary in block style with options.

Decode a YAML binary or iolist, returning {ok, Term} or {error, Msg} instead of raising.

Decode a YAML binary or iolist with options, returning {ok, Term} or {error, Msg} instead of raising.

Types

csv_decode_opt()

-type csv_decode_opt() :: {delimiter, char()} | headers | {keys, atom | existing_atom | binary}.

csv_decode_opts()

-type csv_decode_opts() :: [csv_decode_opt()].

CSV decode options:

  • {delimiter, Char} - field delimiter (default $,)
  • headers - treat the first row as column names and decode each subsequent row as a map keyed by those names, instead of returning every row as a list of fields
  • {keys, atom} - with headers, decode column names as atoms
  • {keys, existing_atom} - with headers, decode column names as existing atoms, falling back to binaries for unknown atoms
  • {keys, binary} - with headers, decode column names as binaries (default)

csv_encode_opt()

-type csv_encode_opt() :: {delimiter, char()} | headers | {line_ending, lf | crlf}.

csv_encode_opts()

-type csv_encode_opts() :: [csv_encode_opt()].

CSV encode options:

  • {delimiter, Char} - field delimiter (default $,)
  • headers - input is a list of maps; the first map's keys become the header row, and subsequent maps are encoded as rows in that column order (missing keys produce empty fields)
  • {line_ending, lf | crlf} - line terminator (default crlf, per RFC 4180)

csv_stream_decoder()

-opaque csv_stream_decoder()

decode_opt()

-type decode_opt() ::
          object_as_tuple | use_nil |
          {null_term, atom()} |
          {keys, atom | existing_atom | binary} |
          dedupe_keys.

decode_opts()

-type decode_opts() :: [decode_opt()].

Decode options:

  • object_as_tuple - decode JSON objects as {[{K, V}]} proplists rather than maps
  • use_nil - use the atom nil for JSON null
  • {null_term, Atom} - use Atom for JSON null
  • {keys, atom} - decode object keys as atoms
  • {keys, existing_atom} - decode keys as existing atoms, fall back to binary
  • {keys, binary} - decode keys as binaries (default)
  • dedupe_keys - with object_as_tuple, eliminate duplicate object keys from the resulting proplist, keeping the last occurrence's value (and position). Has no effect when objects are decoded as maps (the default) or with {keys, atom | existing_atom}: a JSON object with duplicate keys is always deduped (last value wins) when decoded to a map, since maps cannot represent duplicate keys.

encode_opt()

-type encode_opt() :: pretty | uescape | force_utf8 | use_nil | {null_term, atom()}.

encode_opts()

-type encode_opts() :: [encode_opt()].

Encode options:

  • pretty - pretty-print the JSON output
  • uescape - escape non-ASCII characters as \uXXXX sequences
  • force_utf8 - fix invalid UTF-8 sequences before encoding
  • use_nil - encode the atom nil as JSON null
  • {null_term, Atom} - encode Atom as JSON null

json_stream_decoder()

-opaque json_stream_decoder()

scan_state()

-type scan_state() :: tuple().

yaml_decode_opt()

-type yaml_decode_opt() ::
          use_nil | {null_term, atom()} | {keys, atom | existing_atom | binary} | yaml_1_1_bools.

yaml_decode_opts()

-type yaml_decode_opts() :: [yaml_decode_opt()].

YAML decode options:

  • use_nil - use the atom nil for YAML null/~/empty values
  • {null_term, Atom} - use Atom for YAML null/~/empty values
  • {keys, atom} - decode mapping keys as atoms
  • {keys, existing_atom} - decode mapping keys as existing atoms, fall back to binary
  • {keys, binary} - decode mapping keys as binaries (default)
  • yaml_1_1_bools - additionally treat yes/no/on/off (and case variants) as booleans, per the YAML 1.1 core schema. By default (YAML 1.2 core schema) only true/false are recognized as booleans.

yaml_encode_opt()

-type yaml_encode_opt() :: use_nil | {null_term, atom()}.

yaml_encode_opts()

-type yaml_encode_opts() :: [yaml_encode_opt()].

YAML encode options:

  • use_nil - treat the atom nil as YAML null
  • {null_term, Atom} - treat Atom as YAML null

Functions

csv_decode(Input)

-spec csv_decode(binary() | iolist()) -> [[binary()]] | [#{binary() => binary()}].

Decode a CSV binary or iolist to a list of rows.

By default each row is a list of binary fields. With the headers option, the first row is used as column names and each subsequent row is decoded as a map. Raises unterminated_quoted_field or duplicate_header on invalid input.

csv_decode(Input, Opts)

-spec csv_decode(binary() | iolist(), csv_decode_opts()) -> [[binary()]] | [map()].

Decode a CSV binary or iolist to a list of rows, with options. Raises Reason::atom() (unterminated_quoted_field or duplicate_header) on invalid input.

csv_encode(Data)

-spec csv_encode([[term()]] | [map()]) -> binary().

Encode a list of rows to a CSV binary.

Each row is a list of fields (binaries, atoms, integers, or floats). Fields containing the delimiter, a double quote, or a line break are quoted per RFC 4180, with embedded quotes doubled.

csv_encode(Data, Opts)

-spec csv_encode([[term()]] | [map()], csv_encode_opts()) -> binary().

Encode a list of rows to a CSV binary, with options.

With the headers option, Data is a list of maps: the first map's keys become the header row (in iteration order), and each map is encoded as a row in that column order.

csv_stream_decoder()

-spec csv_stream_decoder() -> csv_stream_decoder().

Create a new incremental decoder for feeding CSV in chunks (e.g. from a socket or file), useful when the whole input isn't available up front.

Each complete row is decoded as soon as its terminating line break is seen, via csv_decode/2 on that single row. Only the row boundary detection is incremental — a small byte-scanner tracks whether the cursor is inside a quoted field across chunks, so that \n/\r\n inside quoted fields doesn't end a row.

With the headers option, the first complete row is captured as the header and used to decode every subsequent row as a map; no row is emitted for the header itself.

Example

1> D0 = glazer:csv_stream_decoder(),
2> {Rows1, D1} = glazer:csv_stream_feed(D0, <<"a,b\n1,2\n3,">>),
3> Rows1.
[[<<"a">>,<<"b">>],[<<"1">>,<<"2">>]]
4> {Rows2, D2} = glazer:csv_stream_feed(D1, <<"4\n">>),
5> Rows2.
[[<<"3">>,<<"4">>]]
6> glazer:csv_stream_eof(D2).
{ok, []}

csv_stream_decoder(Opts)

-spec csv_stream_decoder(csv_decode_opts()) -> csv_stream_decoder().

Create a new incremental CSV decoder, passing Opts through to every csv_decode/2 call.

csv_stream_eof/1

-spec csv_stream_eof(csv_stream_decoder()) -> {ok, [[binary()]] | [map()]} | {error, term()}.

Signal end-of-stream: decode any remaining buffered bytes as a final row (useful when the input doesn't end with a trailing line break).

Returns {ok, Rows} with zero or one trailing row, or {error, Reason} if the remaining bytes don't form a valid row.

Example

1> D0 = glazer:csv_stream_decoder(),
2> {Rows1, D1} = glazer:csv_stream_feed(D0, <<"a,b\n1,2">>),
3> Rows1.
[[<<"a">>,<<"b">>]]
4> glazer:csv_stream_eof(D1).
{ok, [[<<"1">>,<<"2">>]]}

csv_stream_feed/2

-spec csv_stream_feed(csv_stream_decoder(), binary() | iolist()) ->
                         {[[binary()]] | [map()], csv_stream_decoder()}.

Feed a chunk of bytes into the decoder, returning any complete CSV rows found so far (in order) along with the updated decoder.

Raises the same exceptions as csv_decode/2 if a row that the scanner deemed complete fails to decode.

Example

loop(Socket, D0) ->
  case gen_tcp:recv(Socket, 0) of
    {ok, Chunk} ->
      {Rows, D1} = glazer:csv_stream_feed(D0, Chunk),
      handle_rows(Rows),
      loop(Socket, D1);
    {error, closed} ->
      case glazer:csv_stream_eof(D0) of
        {ok, Trailing}  -> handle_rows(Trailing);
        {error, Reason} -> handle_truncated_stream(Reason)
      end
  end.

csv_try_decode(Input)

-spec csv_try_decode(binary() | iolist()) -> {ok, [[binary()]]} | {error, atom()}.

Decode a CSV binary or iolist, returning {ok, Rows} or {error, Reason} instead of raising, where Reason is unterminated_quoted_field or duplicate_header.

csv_try_decode(Input, Opts)

-spec csv_try_decode(binary() | iolist(), csv_decode_opts()) ->
                        {ok, [[binary()]] | [map()]} | {error, atom()}.

Decode a CSV binary or iolist with options, returning {ok, Rows} or {error, Reason} instead of raising, where Reason is unterminated_quoted_field or duplicate_header.

decode_integer(NumberString)

-spec decode_integer(binary() | iolist()) -> integer().

Decode a JSON number string to an integer. Raises invalid_number_format on invalid input.

encode_integer(Int)

-spec encode_integer(integer()) -> binary().

Encode an integer to its JSON string representation. Raises badarg if Int is not an integer.

json_decode(Input)

-spec json_decode(binary() | iolist()) -> term().

Decode a JSON binary or iolist to an Erlang term. JSON objects are returned as maps (default). Raises {parse_error, Msg} on invalid input.

json_decode(Input, Opts)

-spec json_decode(binary() | iolist(), decode_opts()) -> term().

Decode a JSON binary or iolist to an Erlang term with options. Raises {parse_error, Reason} on invalid input.

json_encode(Data)

-spec json_encode(term()) -> binary().

Encode an Erlang term to a JSON binary.

json_encode(Data, Opts)

-spec json_encode(term(), encode_opts()) -> binary().

Encode an Erlang term to a JSON binary with options.

json_minify(Input)

-spec json_minify(binary() | iolist()) -> binary().

Minify a JSON binary or iolist, removing all unnecessary whitespace.

json_prettify(Input)

-spec json_prettify(binary() | iolist()) -> binary().

Pretty-print a JSON binary or iolist with two-space indentation.

json_scan(Bin)

-spec json_scan(binary() | iolist()) -> {complete, non_neg_integer()} | {incomplete, scan_state()}.

Locate the end of the next complete top-level JSON value in Bin, without decoding it.

Returns:

  • {complete, EndOffset} - a complete value spans binary:part(Bin, 0, EndOffset); the rest of Bin (if any) is left over for the next call
  • {incomplete, ScanState} - Bin doesn't yet contain a complete value; feed more data via json_scan/2 once it's available, passing the entire unconsumed remainder (this Bin, with new bytes appended) plus ScanState

This is the low-level primitive behind json_stream_feed/2; most callers should use the stream_* API instead.

Example

Slicing off complete values from a buffer of concatenated JSON:

1> Buf0 = <<"{\"a\":1} {\"b\":2}">>,
2> {complete, End1} = glazer:json_scan(Buf0).
{complete, 7}
3> <<Val1:End1/binary, Buf1/binary>> = Buf0,
4> Val1.
<<"{\"a\":1}">>
5> Buf1.
<<" {\"b\":2}">>
6> {complete, End2} = glazer:json_scan(Buf1).
{complete, 8}

Resuming a scan once more bytes arrive:

1> {incomplete, S0} = glazer:json_scan(<<"{\"a\":">>).
{incomplete, {6,1,false,false,true,false}}
2> glazer:json_scan(<<"{\"a\":1}">>, S0).
{complete, 7}

json_scan(Bin, ScanState)

-spec json_scan(binary() | iolist(), scan_state()) ->
                   {complete, non_neg_integer()} | {incomplete, scan_state()}.

Resume scanning Bin (the unconsumed remainder plus newly-appended bytes) from ScanState.

json_stream_decoder()

-spec json_stream_decoder() -> json_stream_decoder().

Create a new incremental decoder for feeding JSON in chunks (e.g. from a socket or file), useful when a complete document isn't available up front or when a stream contains a sequence of concatenated/whitespace-separated JSON values (e.g. newline-delimited JSON).

Decoding itself is not incremental — each complete top-level value is still decoded in a single pass via json_decode/2 using the library's fast whole-buffer decoder. Only the boundary detection (finding where one value ends and the next begins) is incremental, via a small byte-scanner that tracks nesting/string state across chunks.

Example

1> D0 = glazer:json_stream_decoder(),
2> {Vals1, D1} = glazer:json_stream_feed(D0, <<"{\"a\":1} {\"b\":">>),
3> Vals1.
[#{<<"a">> => 1}]
4> {Vals2, _D2} = glazer:json_stream_feed(D1, <<"2}">>),
5> Vals2.
[#{<<"b">> => 2}]

json_stream_decoder(Opts)

-spec json_stream_decoder(decode_opts()) -> json_stream_decoder().

Create a new incremental decoder, passing Opts through to every json_decode/2 call.

json_stream_eof/1

-spec json_stream_eof(json_stream_decoder()) -> {ok, [term()]} | {error, term()}.

Signal end-of-stream: decode any remaining buffered bytes as a final value (useful for a trailing bare scalar, e.g. a lone number or true/null, which the scanner can't otherwise distinguish from a value that's still being written to mid-chunk).

Returns {ok, [Term]} with zero or one trailing value, or {error, Reason} if the remaining bytes don't form a complete value.

Example

1> D0 = glazer:json_stream_decoder(),
2> {Vals1, D1} = glazer:json_stream_feed(D0, <<"123">>),
3> Vals1.
[]
4> glazer:json_stream_eof(D1).
{ok, [123]}

A stream that ends mid-value (e.g. a dropped connection) yields an error instead of silently dropping the partial data:

1> D0 = glazer:json_stream_decoder(),
2> {Vals1, D1} = glazer:json_stream_feed(D0, <<"{\"a\":1, \"b\":">>),
3> Vals1.
[]
4> glazer:json_stream_eof(D1).
{error, _Reason}

json_stream_feed/2

-spec json_stream_feed(json_stream_decoder(), binary() | iolist()) -> {[term()], json_stream_decoder()}.

Feed a chunk of bytes into the decoder, returning any complete JSON values found so far (in order) along with the updated decoder.

Raises the same exceptions as json_decode/2 (e.g. Reason) if a value that the scanner deemed complete fails to decode.

Example

Call json_stream_feed/2 for each chunk received from the source while more data may still arrive, and json_stream_eof/1 once the source is exhausted to flush any trailing value:

loop(Socket, D0) ->
  case gen_tcp:recv(Socket, 0) of
    {ok, Chunk} ->
      {Vals, D1} = glazer:json_stream_feed(D0, Chunk),
      handle_values(Vals),
      loop(Socket, D1);
    {error, closed} ->
      case glazer:json_stream_eof(D0) of
        {ok, Trailing}  -> handle_values(Trailing);
        {error, Reason} -> handle_truncated_stream(Reason)
      end
  end.

The same decoder fits naturally into a gen_server driving an active-mode socket: keep the json_stream_decoder() in the process state, feed it from handle_info({tcp, ...}), and flush it on {tcp_closed, ...}:

-module(json_conn).
-behaviour(gen_server).
-export([start_link/1]).
-export([init/1, handle_call/3, handle_cast/2, handle_info/2]).

-record(state, {socket, decoder}).

start_link(Socket) ->
  gen_server:start_link(?MODULE, Socket, []).

init(Socket) ->
  inet:setopts(Socket, [{active, once}]),
  {ok, #state{socket = Socket, decoder = glazer:json_stream_decoder()}}.

handle_info({tcp, Socket, Data}, #state{socket = Socket, decoder = D0} = State) ->
  {Vals, D1} = glazer:json_stream_feed(D0, Data),
  lists:foreach(fun handle_value/1, Vals),
  inet:setopts(Socket, [{active, once}]),
  {noreply, State#state{decoder = D1}};

handle_info({tcp_closed, Socket}, #state{socket = Socket, decoder = D0} = State) ->
  case glazer:json_stream_eof(D0) of
    {ok, Trailing}  -> lists:foreach(fun handle_value/1, Trailing);
    {error, Reason} -> handle_truncated_stream(Reason)
  end,
  {stop, normal, State};

handle_info({tcp_error, Socket, Reason}, #state{socket = Socket} = State) ->
  {stop, Reason, State}.

handle_call(_Request, _From, State) -> {reply, ok, State}.
handle_cast(_Request, State)        -> {noreply, State}.

handle_value(Val) ->
  io:format("received: ~p~n", [Val]).

json_try_decode(Input)

-spec json_try_decode(binary() | iolist()) -> {ok, term()} | {error, binary()}.

Decode a JSON binary or iolist, returning {ok, Term} or {error, Reason} instead of raising.

json_try_decode(Input, Opts)

-spec json_try_decode(binary() | iolist(), decode_opts()) -> {ok, term()} | {error, binary()}.

Decode a JSON binary or iolist with options, returning {ok, Term} or {error, Reason} instead of raising.

try_decode_integer(NumberString)

-spec try_decode_integer(binary() | iolist()) -> {ok, integer()} | {error, invalid_number_format}.

Decode a JSON number string to an integer, returning {ok, Int} or {error, invalid_number_format} instead of raising.

yaml_decode(Input)

-spec yaml_decode(binary() | iolist()) -> term().

Decode a YAML binary or iolist to an Erlang term. YAML mappings are returned as maps (default). Raises {parse_error, Reason} on invalid input.

yaml_decode(Input, Opts)

-spec yaml_decode(binary() | iolist(), yaml_decode_opts()) -> term().

Decode a YAML binary or iolist to an Erlang term with options. Raises {parse_error, Msg} on invalid input.

yaml_encode(Data)

-spec yaml_encode(term()) -> binary().

Encode an Erlang term to a YAML binary in block style (2-space indentation, sequences at the same indentation as the mapping key that owns them).

yaml_encode(Data, Opts)

-spec yaml_encode(term(), yaml_encode_opts()) -> binary().

Encode an Erlang term to a YAML binary in block style with options.

yaml_try_decode(Input)

-spec yaml_try_decode(binary() | iolist()) -> {ok, term()} | {error, binary()}.

Decode a YAML binary or iolist, returning {ok, Term} or {error, Msg} instead of raising.

yaml_try_decode(Input, Opts)

-spec yaml_try_decode(binary() | iolist(), yaml_decode_opts()) -> {ok, term()} | {error, binary()}.

Decode a YAML binary or iolist with options, returning {ok, Term} or {error, Msg} instead of raising.