glazer_csv (glazer v0.5.5)
View SourceFast CSV encoding and decoding using the glaze C++ library.
By default nulls (e.g. produced by on_failure => null) are represented
as the atom null. To change it application-wide, set the null env key
in your config:
{glazer, [{null, nil}]}.Features
- RFC 4180 CSV encoding/decoding via
decode/1,2andencode/1,2, with optional header-row support - Per-column field type conversion (
{fields, Specs}), including integers, floats, booleans, datetimes, atoms, and strings (binaries) - Incremental/streaming CSV decoding via
stream_decoder/0,1,stream_feed/2,stream_eof/1 - Configurable representation of CSV
nullvalues read_file/1,2andwrite_file/2,3helpers for decoding/encoding directly to/from a file
See also [https://github.com/stephenberry/glaze]
Summary
Types
The result of a successful CSV decode: a map with two keys.
Error reasons returned by try_decode/1,2 or raised by decode/1,2
A single CSV decode option. See decode_opts/0 for the full reference
table of all available options and their effects.
CSV decode options
A single CSV encode option. See encode_opts/0 for descriptions of all
available options.
CSV encode options
Controls what happens when a non-empty field fails to convert to the
requested field_type() (default binary)
A single element of the {fields, Specs} CSV decode option: either a
field_type() directly, or a map for more control
A single column's target type for the {fields, Specs} CSV decode option
How the header row should be represented when using {headers, Type}
Resumable state of the incremental row-boundary scanner used inside a
stream_decoder/0. Carries the current byte offset and a flag
indicating whether the scanner is currently inside a quoted field.
Exposed so the state can be serialised or inspected; normal usage does
not require direct access to this type.
Opaque handle for incremental CSV decoding. Created by
stream_decoder/0,1 and threaded through successive stream_feed/2
calls; call stream_eof/1 to flush any remaining buffered bytes at the
end of the input.
Functions
Decode a CSV binary or iolist.
Decode a CSV binary or iolist with options (see decode_opts/0).
Returns a csv_result/0.
Raises Reason :: t:decode_error/0 on invalid input.
Encode a list of rows to a CSV binary.
Encode a list of rows to a CSV binary, with options.
Read Filename and decode its contents as CSV.
Read Filename and decode its contents as CSV, with decode options
(see decode/2).
Create a new incremental decoder for feeding CSV in chunks (e.g. from a socket or file), useful when the whole input isn't available up front.
Create a new incremental CSV decoder, passing Opts through to every
internal decode/2 call.
Signal end-of-stream: decode any remaining buffered bytes as a final row (useful when the input doesn't end with a trailing line break).
Feed a chunk of bytes into the decoder, returning any complete CSV rows found so far (in order) along with the updated decoder.
Decode a CSV binary or iolist, returning {ok, Result} or
{error, Reason} instead of raising.
Result is a csv_result/0; Reason is a decode_error/0.
Decode a CSV binary or iolist with options (see decode_opts/0),
returning {ok, Result} or {error, Reason} instead of raising.
Result is a csv_result/0; Reason is a decode_error/0.
Encode Data to CSV and write it to Filename, overwriting any existing
file.
Encode Data to CSV with encode options (see encode/2) and write it to
Filename, overwriting any existing file.
Types
-type csv_result() :: #{headers := nil | [binary() | atom()], data := [[term()]] | [tuple()] | [map()]}.
The result of a successful CSV decode: a map with two keys.
headers-nilwhen theheadersoption was not given; otherwise a list of column names (binaries by default, atoms with{headers, atom}or{headers, existing_atom})data- list of data rows; each row is a list of field values by default, a tuple of field values with{return, tuple}, or a map keyed by the column names when bothheadersand{return, map}are given
-type decode_error() :: unterminated_quoted_field | duplicate_header | {invalid_field_value, Row :: pos_integer(), Column :: pos_integer()}.
Error reasons returned by try_decode/1,2 or raised by decode/1,2:
unterminated_quoted_field— input ended inside a"..."field with no closing quoteduplicate_header— two columns share the same name and{return, map}was requested (map keys must be unique){invalid_field_value, Row, Column}— a field at the given 1-based row/column position failed to convert to the type requested by{fields, Specs}withon_failure => raise
-type decode_opt() :: {delimiter, char()} | headers | {headers, [atom() | binary()] | headers_type()} | {fields, [field_spec()]} | {null_term, atom()} | {return, list | map | tuple} | {skip, non_neg_integer() | {pos_integer(), pos_integer()}} | {limit, pos_integer()}.
A single CSV decode option. See decode_opts/0 for the full reference
table of all available options and their effects.
-type decode_opts() :: [decode_opt()].
CSV decode options:
| Option | Description |
|---|---|
{delimiter, Char} | Field delimiter character (default $,) |
headers | Treat the first row as column names (shorthand for {headers, binary}) |
{headers, [Name, ...]} | Use the given list of atoms or binaries as column names; the first data row is not consumed as a header |
{headers, binary} | First row → binary column names (same as bare headers) |
{headers, string} | Alias for {headers, binary} |
{headers, atom} | First row → atom column names (via binary_to_atom/2-equivalent) |
{headers, existing_atom} | First row → existing-atom column names (fall back to binary for unknown atoms) |
{headers, charlist} | First row → column names as lists of Unicode codepoints |
{return, list} | Data rows are lists of field values (default) |
{return, tuple} | Data rows are tuples of field values |
{return, map} | Data rows are maps keyed by column names; requires headers or {headers, ...}. Raises duplicate_header on duplicate column names |
{fields, Specs} | Per-column type conversion, applied positionally; see field_spec/0 |
{skip, N} | Skip the first N data rows (after any header row) |
{skip, {From, To}} | Process only data rows From..To (1-based inclusive); equivalent to {skip, From-1} plus {limit, To-From+1} |
{limit, N} | Process at most N data rows (after skipping) |
{null_term, Atom} | Atom to use for on_failure => null; overrides the library-wide null env var |
-type encode_opt() :: {delimiter, char()} | headers | {headers, [atom() | binary()]} | {line_ending, lf | crlf}.
A single CSV encode option. See encode_opts/0 for descriptions of all
available options.
-type encode_opts() :: [encode_opt()].
CSV encode options:
{delimiter, Char}- field delimiter (default$,)headers- input is a list of maps; the first map's keys become the header row, and subsequent maps are encoded as rows in that column order (missing keys produce empty fields){headers, [Name, ...]}- input is a list of maps; uses the given list of atoms or binaries (matching the maps' key type) as the column order and header row, instead of deriving it from the first map's keys (missing keys produce empty fields){line_ending, lf | crlf}- line terminator (defaultcrlf, per RFC 4180)
-type field_on_failure() :: binary | raise | default | null.
Controls what happens when a non-empty field fails to convert to the
requested field_type() (default binary):
binary- leave the field as the original binary (default)raise- raise (or return{error, Reason}fromtry_decode/2){invalid_field_value, Row, Column}(1-based)default- use the spec'sdefaultvalue (falls back tobinaryif nodefaultis given)null- use the configured null term:{null_term, Atom}if given, otherwise the library-widenullterm (see thenullapplication env var, Null term configuration)
-type field_spec() :: field_type() | #{type := field_type(), default => term(), on_failure => field_on_failure()}.
A single element of the {fields, Specs} CSV decode option: either a
field_type() directly, or a map for more control:
type- thefield_type()to convert the field todefault- used in place of the converted value whenever the raw CSV field is emptyon_failure- seefield_on_failure/0(defaultbinary)
-type field_type() :: integer | {float, non_neg_integer()} | boolean | {datetime, binary()} | binary | charlist | existing_atom | {atom, ExistingAtoms :: [atom()]}.
A single column's target type for the {fields, Specs} CSV decode option:
integer- parse as an integer{float, Precision}- parse as a float, rounded toPrecisiondecimal digitsboolean- parse"true"/"false"(any case) astrue/false{datetime, InputFormat}- parse using astrptime-like format string (%Y %m %d %H %M %S %f %zand literals;%zacceptsZ,+HHMM, or+HH:MM), converting the result to Unix epoch seconds (UTC)binary- leave as a binary (default)charlist- convert to a list of Unicode code pointsexisting_atom- convert to an existing atom, falling back to a binary if no such atom exists{atom, ExistingAtoms}- convert to an atom only if the field's text matches (and exists as) one ofExistingAtoms, falling back to a binary otherwise
-type headers_type() :: atom | existing_atom | binary | string | charlist.
How the header row should be represented when using {headers, Type}:
atom- column names are converted to atoms (viabinary_to_atom/2-equivalent)existing_atom- column names are converted to existing atoms (binaries if not found)binary- column names are kept as binaries (default)string- alias forbinarycharlist- column names are converted to lists of Unicode codepoints
-type scan_state() :: {non_neg_integer(), boolean()}.
Resumable state of the incremental row-boundary scanner used inside a
stream_decoder/0. Carries the current byte offset and a flag
indicating whether the scanner is currently inside a quoted field.
Exposed so the state can be serialised or inspected; normal usage does
not require direct access to this type.
-opaque stream_decoder()
Opaque handle for incremental CSV decoding. Created by
stream_decoder/0,1 and threaded through successive stream_feed/2
calls; call stream_eof/1 to flush any remaining buffered bytes at the
end of the input.
Functions
-spec decode(binary() | iolist()) -> csv_result().
Decode a CSV binary or iolist.
Returns a csv_result/0 map #{headers => nil, data => Rows} where
Rows is a list of rows, each row a list of binary fields. With the
headers option the first row is captured as column names in headers
instead of appearing in data.
Raises Reason :: t:decode_error/0 on invalid input.
Examples
1> glazer_csv:decode(<<"a,b\n1,2\n3,4\n">>).
#{headers => nil, data => [[<<"a">>,<<"b">>],[<<"1">>,<<"2">>],[<<"3">>,<<"4">>]]}
2> glazer_csv:decode(<<>>).
#{headers => nil, data => []}
3> glazer_csv:decode(<<"\"hello, world\",42\n">>).
#{headers => nil, data => [[<<"hello, world">>,<<"42">>]]}
-spec decode(binary() | iolist(), decode_opts()) -> csv_result().
Decode a CSV binary or iolist with options (see decode_opts/0).
Returns a csv_result/0.
Raises Reason :: t:decode_error/0 on invalid input.
Examples
%% First row as binary column names
1> glazer_csv:decode(<<"name,age\nAlice,30\nBob,25\n">>, [headers]).
#{headers => [<<"name">>,<<"age">>],
data => [[<<"Alice">>,<<"30">>],[<<"Bob">>,<<"25">>]]}
%% Explicit column names — no header row expected in the data
2> glazer_csv:decode(<<"Alice,30\n">>, [{headers, [name, age]}, {return, map}]).
#{headers => [name,age], data => [#{age => <<"30">>, name => <<"Alice">>}]}
%% Per-column type conversion
3> glazer_csv:decode(<<"Alice,30\n">>, [{fields, [binary, integer]}]).
#{headers => nil, data => [[<<"Alice">>,30]]}
%% Semi-colon delimiter, skip first 2 rows, limit to 3
4> glazer_csv:decode(<<"h1;h2\nr1a;r1b\nr2a;r2b\nr3a;r3b\nr4a;r4b\n">>,
[{delimiter, $;}, headers, {skip, 1}, {limit, 2}]).
#{headers => [<<"h1">>,<<"h2">>],
data => [[<<"r2a">>,<<"r2b">>],[<<"r3a">>,<<"r3b">>]]}
%% Rows as maps with atom keys
5> glazer_csv:decode(<<"a,b\n1,2\n">>, [{headers, existing_atom}, {return, map}]).
#{headers => [a,b], data => [#{a => <<"1">>, b => <<"2">>}]}
%% Rows as tuples
6> glazer_csv:decode(<<"a,b\n1,2\n">>, [{return, tuple}]).
#{headers => nil, data => [{<<"a">>,<<"b">>},{<<"1">>,<<"2">>}]}
Encode a list of rows to a CSV binary.
Each row is a list of fields (binaries, atoms, integers, or floats). Fields containing the delimiter, a double quote, or a line break are quoted per RFC 4180, with embedded quotes doubled.
Examples
1> glazer_csv:encode([[<<"a">>, <<"b">>], [1, 2]]).
<<"a,b\r\n1,2\r\n">>
2> glazer_csv:encode([[<<"hello, world">>, <<"say \"hi\"">>]]).
<<"\"hello, world\",\"say \"\"hi\"\"\"\r\n">>
3> glazer_csv:encode([]).
<<>>
-spec encode([[term()]] | [map()], encode_opts()) -> binary().
Encode a list of rows to a CSV binary, with options.
With the headers option, Data is a list of maps: the first map's keys
become the header row (in iteration order), and each map is encoded as a
row in that column order.
Examples
%% Maps to CSV with a header row
1> glazer_csv:encode([#{<<"name">> => <<"Alice">>, <<"age">> => 30}], [headers]).
<<"age,name\r\n30,Alice\r\n">>
%% Maps to CSV with an explicit column order
2> glazer_csv:encode([#{<<"name">> => <<"Alice">>, <<"age">> => 30}],
[{headers, [<<"name">>, <<"age">>]}]).
<<"name,age\r\nAlice,30\r\n">>
%% Semicolon delimiter with LF line endings
3> glazer_csv:encode([[<<"a">>, <<"b">>], [1, 2]],
[{delimiter, $;}, {line_ending, lf}]).
<<"a;b\n1;2\n">>
-spec read_file(file:name_all()) -> csv_result().
Read Filename and decode its contents as CSV.
Raises Reason::decode_error() if the file's contents aren't valid CSV, or
a binary "Filename: Reason" message (see file:format_error/1) if the
file can't be read.
Examples
%% File contains: name,age\nAlice,30\n
1> glazer_csv:read_file("data.csv").
#{headers => nil, data => [[<<"name">>,<<"age">>],[<<"Alice">>,<<"30">>]]}
2> glazer_csv:read_file("missing.csv").
** exception error: <<"missing.csv: no such file or directory">>
-spec read_file(file:name_all(), decode_opts()) -> csv_result().
Read Filename and decode its contents as CSV, with decode options
(see decode/2).
Examples
%% File contains: name,age\nAlice,30\nBob,25\n
1> glazer_csv:read_file("data.csv", [headers, {return, map}]).
#{headers => [<<"name">>,<<"age">>],
data => [#{<<"age">> => <<"30">>, <<"name">> => <<"Alice">>},
#{<<"age">> => <<"25">>, <<"name">> => <<"Bob">>}]}
2> glazer_csv:read_file("data.csv", [headers, {fields, [binary, integer]}]).
#{headers => [<<"name">>,<<"age">>], data => [[<<"Alice">>,30],[<<"Bob">>,25]]}
-spec stream_decoder() -> stream_decoder().
Create a new incremental decoder for feeding CSV in chunks (e.g. from a socket or file), useful when the whole input isn't available up front.
Each complete row is decoded as soon as its terminating line break is seen,
via decode/2 on that single row. Only the row
boundary detection is incremental — a small byte-scanner tracks whether
the cursor is inside a quoted field across chunks, so that \n/\r\n
inside quoted fields doesn't end a row.
With the headers option, the first complete row is captured as the header;
no row is emitted for it. Passes the same options as decode/2 to every
row decode internally (see stream_decoder/1 to supply options).
Examples
1> D0 = glazer_csv:stream_decoder(),
{Rows1, D1} = glazer_csv:stream_feed(D0, <<"a,b\n1,2\n3,">>),
Rows1.
[[<<"a">>,<<"b">>],[<<"1">>,<<"2">>]]
2> {Rows2, D2} = glazer_csv:stream_feed(D1, <<"4\n">>),
Rows2.
[[<<"3">>,<<"4">>]]
3> glazer_csv:stream_eof(D2).
{ok, []}
-spec stream_decoder(decode_opts()) -> stream_decoder().
Create a new incremental CSV decoder, passing Opts through to every
internal decode/2 call.
All options from decode/2 are accepted except {skip, ...} and
{limit, ...}, which are ignored in streaming mode (the caller controls
which rows to process by consuming the output of stream_feed/2).
When {headers, [List]} is given, the explicit header names are
pre-populated and no header row is consumed from the stream.
Examples
%% Headers option: first row captured, data rows returned as field lists
1> D0 = glazer_csv:stream_decoder([headers]),
{Rows, D1} = glazer_csv:stream_feed(D0, <<"name,age\nAlice,30\n">>),
Rows.
[[<<"Alice">>,<<"30">>]]
%% Explicit headers + map output
2> D0 = glazer_csv:stream_decoder([{headers, [name, age]}, {return, map}]),
{Rows, _D1} = glazer_csv:stream_feed(D0, <<"Alice,30\n">>),
Rows.
[#{age => <<"30">>, name => <<"Alice">>}]
%% Semicolon delimiter
3> D0 = glazer_csv:stream_decoder([{delimiter, $;}]),
{Rows, _D1} = glazer_csv:stream_feed(D0, <<"a;b\n1;2\n">>),
Rows.
[[<<"a">>,<<"b">>],[<<"1">>,<<"2">>]]
-spec stream_eof(stream_decoder()) -> {ok, [[term()]] | [tuple()] | [map()]} | {error, term()}.
Signal end-of-stream: decode any remaining buffered bytes as a final row (useful when the input doesn't end with a trailing line break).
Returns {ok, Rows} with zero or one trailing row, or {error, Reason} if
the remaining bytes don't form a valid row.
Examples
%% Input without a trailing newline
1> D0 = glazer_csv:stream_decoder(),
{Rows1, D1} = glazer_csv:stream_feed(D0, <<"a,b\n1,2">>),
Rows1.
[[<<"a">>,<<"b">>]]
2> glazer_csv:stream_eof(D1).
{ok, [[<<"1">>,<<"2">>]]}
%% Input ending with a newline — nothing left at EOF
3> D0 = glazer_csv:stream_decoder(),
{_Rows, D1} = glazer_csv:stream_feed(D0, <<"a,b\n">>),
glazer_csv:stream_eof(D1).
{ok, []}
%% Unterminated quoted field surfaces here
4> D0 = glazer_csv:stream_decoder(),
{[], D1} = glazer_csv:stream_feed(D0, <<"\"unterminated">>),
glazer_csv:stream_eof(D1).
{error, unterminated_quoted_field}
-spec stream_feed(stream_decoder(), binary() | iolist()) -> {[[term()]] | [tuple()] | [map()], stream_decoder()}.
Feed a chunk of bytes into the decoder, returning any complete CSV rows found so far (in order) along with the updated decoder.
Raises the same exceptions as decode/2 if a row that
the scanner deemed complete fails to decode.
Examples
%% Rows split across two feed calls
1> D0 = glazer_csv:stream_decoder(),
{Rows1, D1} = glazer_csv:stream_feed(D0, <<"a,b\n1,">>),
Rows1.
[[<<"a">>,<<"b">>]]
2> {Rows2, D2} = glazer_csv:stream_feed(D1, <<"2\n">>),
Rows2.
[[<<"1">>,<<"2">>]]
3> glazer_csv:stream_eof(D2).
{ok, []}
%% Typical socket-reading loop
loop(Socket, D0) ->
case gen_tcp:recv(Socket, 0) of
{ok, Chunk} ->
{Rows, D1} = glazer_csv:stream_feed(D0, Chunk),
handle_rows(Rows),
loop(Socket, D1);
{error, closed} ->
case glazer_csv:stream_eof(D0) of
{ok, Trailing} -> handle_rows(Trailing);
{error, Reason} -> handle_truncated_stream(Reason)
end
end.
-spec try_decode(binary() | iolist()) -> {ok, csv_result()} | {error, decode_error()}.
Decode a CSV binary or iolist, returning {ok, Result} or
{error, Reason} instead of raising.
Result is a csv_result/0; Reason is a decode_error/0.
Examples
1> glazer_csv:try_decode(<<"a,b\n1,2\n">>).
{ok, #{headers => nil, data => [[<<"a">>,<<"b">>],[<<"1">>,<<"2">>]]}}
2> glazer_csv:try_decode(<<"\"unterminated">>).
{error, unterminated_quoted_field}
-spec try_decode(binary() | iolist(), decode_opts()) -> {ok, csv_result()} | {error, decode_error()}.
Decode a CSV binary or iolist with options (see decode_opts/0),
returning {ok, Result} or {error, Reason} instead of raising.
Result is a csv_result/0; Reason is a decode_error/0.
Examples
1> glazer_csv:try_decode(<<"name,age\nAlice,30\n">>, [headers]).
{ok, #{headers => [<<"name">>,<<"age">>], data => [[<<"Alice">>,<<"30">>]]}}
2> glazer_csv:try_decode(<<"x">>,
[{fields, [#{type => integer, on_failure => raise}]}]).
{error, {invalid_field_value, 1, 1}}
-spec write_file(file:name_all(), [[term()]] | [map()]) -> ok.
Encode Data to CSV and write it to Filename, overwriting any existing
file.
Raises a binary "Filename: Reason" message (see file:format_error/1)
if the file can't be written.
Examples
1> glazer_csv:write_file("out.csv", [[<<"name">>,<<"age">>],[<<"Alice">>,30]]).
ok
2> glazer_csv:write_file("/read-only/out.csv", []).
** exception error: <<"/read-only/out.csv: permission denied">>
-spec write_file(file:name_all(), [[term()]] | [map()], encode_opts()) -> ok.
Encode Data to CSV with encode options (see encode/2) and write it to
Filename, overwriting any existing file.
Examples
%% Write maps as CSV with a header row and LF line endings
1> glazer_csv:write_file("out.csv",
[#{<<"name">> => <<"Alice">>, <<"score">> => 99}],
[headers, {line_ending, lf}]).
ok
%% Write with a semicolon delimiter
2> glazer_csv:write_file("out.csv",
[[<<"a">>, <<"b">>], [1, 2]],
[{delimiter, $;}]).
ok