glazer_csv (glazer v0.5.0)
View SourceFast CSV encoding and decoding using the glaze C++ library.
By default nulls (e.g. produced by on_failure => null) are represented
as the atom null. To change it application-wide, set the null env key
in your config:
{glazer, [{null, nil}]}.Features
- RFC 4180 CSV encoding/decoding via
decode/1,2andencode/1,2, with optional header-row support - Per-column field type conversion (
{fields, Specs}), including integers, floats, booleans, datetimes, atoms, and strings (binaries) - Incremental/streaming CSV decoding via
stream_decoder/0,1,stream_feed/2,stream_eof/1 - Configurable representation of CSV
nullvalues read_file/1,2andwrite_file/2,3helpers for decoding/encoding directly to/from a file
See also [https://github.com/stephenberry/glaze]
Summary
Types
CSV decode options
CSV encode options
Controls what happens when a non-empty field fails to convert to the
requested field_type() (default binary)
A single element of the {fields, Specs} CSV decode option: either a
field_type() directly, or a map for more control
A single column's target type for the {fields, Specs} CSV decode option
Functions
Decode a CSV binary or iolist to a list of rows.
Decode a CSV binary or iolist to a list of rows, with options.
Raises Reason::decode_error() on invalid input.
Encode a list of rows to a CSV binary.
Encode a list of rows to a CSV binary, with options.
Read Filename and decode its contents as CSV.
Read Filename and decode its contents as CSV, with decode options
(see decode/2).
Create a new incremental decoder for feeding CSV in chunks (e.g. from a socket or file), useful when the whole input isn't available up front.
Create a new incremental CSV decoder, passing Opts through to every
decode/2 call.
Signal end-of-stream: decode any remaining buffered bytes as a final row (useful when the input doesn't end with a trailing line break).
Feed a chunk of bytes into the decoder, returning any complete CSV rows found so far (in order) along with the updated decoder.
Decode a CSV binary or iolist, returning {ok, Rows} or
{error, Reason} instead of raising, where Reason is a
decode_error().
Decode a CSV binary or iolist with options, returning {ok, Rows} or
{error, Reason} instead of raising, where Reason is a
decode_error().
Encode Data to CSV and write it to Filename, overwriting any existing
file.
Encode Data to CSV with encode options (see encode/2) and write it to
Filename, overwriting any existing file.
Types
-type decode_error() :: unterminated_quoted_field | duplicate_header | {invalid_field_value, Row :: pos_integer(), Column :: pos_integer()}.
-type decode_opt() :: {delimiter, char()} | headers | {keys, atom | existing_atom | binary} | {fields, [field_spec()]} | {null_term, atom()}.
-type decode_opts() :: [decode_opt()].
CSV decode options:
{delimiter, Char}- field delimiter (default$,)headers- treat the first row as column names and decode each subsequent row as a map keyed by those names, instead of returning every row as a list of fields{keys, atom}- withheaders, decode column names as atoms{keys, existing_atom}- withheaders, decode column names as existing atoms, falling back to binaries for unknown atoms{keys, binary}- withheaders, decode column names as binaries (default){fields, Specs}- convert each column's field from a binary, positionally (the Nth spec applies to the Nth column, regardless ofheaders). Columns beyond the end ofSpecs, or given typebinary, are left as binaries. Seefield_spec/0andfield_type/0for the available types and thedefault/on_failureoptions{null_term, Atom}- useAtomas the value produced byon_failure => null, overriding the library-widenullterm for this call (default: the library-widenullterm, see thenullapplication env var)
-type encode_opt() :: {delimiter, char()} | headers | {line_ending, lf | crlf}.
-type encode_opts() :: [encode_opt()].
CSV encode options:
{delimiter, Char}- field delimiter (default$,)headers- input is a list of maps; the first map's keys become the header row, and subsequent maps are encoded as rows in that column order (missing keys produce empty fields){line_ending, lf | crlf}- line terminator (defaultcrlf, per RFC 4180)
-type field_on_failure() :: binary | raise | default | null.
Controls what happens when a non-empty field fails to convert to the
requested field_type() (default binary):
binary- leave the field as the original binary (default)raise- raise (or return{error, Reason}fromtry_decode/2){invalid_field_value, Row, Column}(1-based)default- use the spec'sdefaultvalue (falls back tobinaryif nodefaultis given)null- use the configured null term:{null_term, Atom}if given, otherwise the library-widenullterm (see thenullapplication env var, Null term configuration)
-type field_spec() :: field_type() | #{type := field_type(), default => term(), on_failure => field_on_failure()}.
A single element of the {fields, Specs} CSV decode option: either a
field_type() directly, or a map for more control:
type- thefield_type()to convert the field todefault- used in place of the converted value whenever the raw CSV field is emptyon_failure- seefield_on_failure/0(defaultbinary)
-type field_type() :: integer | {float, non_neg_integer()} | boolean | {datetime, binary()} | binary | charlist | existing_atom | {atom, ExistingAtoms :: [atom()]}.
A single column's target type for the {fields, Specs} CSV decode option:
integer- parse as an integer{float, Precision}- parse as a float, rounded toPrecisiondecimal digitsboolean- parse"true"/"false"(any case) astrue/false{datetime, InputFormat}- parse using astrptime-like format string (%Y %m %d %H %M %S %f %zand literals;%zacceptsZ,+HHMM, or+HH:MM), converting the result to Unix epoch seconds (UTC)binary- leave as a binary (default)charlist- convert to a list of Unicode code pointsexisting_atom- convert to an existing atom, falling back to a binary if no such atom exists{atom, ExistingAtoms}- convert to an atom only if the field's text matches (and exists as) one ofExistingAtoms, falling back to a binary otherwise
-type scan_state() :: {non_neg_integer(), boolean()}.
-opaque stream_decoder()
Functions
Decode a CSV binary or iolist to a list of rows.
By default each row is a list of binary fields. With the headers option,
the first row is used as column names and each subsequent row is decoded
as a map. Raises unterminated_quoted_field or duplicate_header on
invalid input.
-spec decode(binary() | iolist(), decode_opts()) -> [[binary()]] | [map()].
Decode a CSV binary or iolist to a list of rows, with options.
Raises Reason::decode_error() on invalid input.
Encode a list of rows to a CSV binary.
Each row is a list of fields (binaries, atoms, integers, or floats). Fields containing the delimiter, a double quote, or a line break are quoted per RFC 4180, with embedded quotes doubled.
-spec encode([[term()]] | [map()], encode_opts()) -> binary().
Encode a list of rows to a CSV binary, with options.
With the headers option, Data is a list of maps: the first map's keys
become the header row (in iteration order), and each map is encoded as a
row in that column order.
-spec read_file(file:name_all()) -> [[binary()]] | [map()].
Read Filename and decode its contents as CSV.
Raises Reason::decode_error() if the file's contents aren't valid CSV, or
a binary "Filename: Reason" message (see file:format_error/1) if the
file can't be read.
Example
1> glazer_csv:read_file("data.csv").
[[<<"a">>,<<"b">>],[<<"1">>,<<"2">>]]
-spec read_file(file:name_all(), decode_opts()) -> [[binary()]] | [map()].
Read Filename and decode its contents as CSV, with decode options
(see decode/2).
-spec stream_decoder() -> stream_decoder().
Create a new incremental decoder for feeding CSV in chunks (e.g. from a socket or file), useful when the whole input isn't available up front.
Each complete row is decoded as soon as its terminating line break is seen,
via decode/2 on that single row. Only the row
boundary detection is incremental — a small byte-scanner tracks whether
the cursor is inside a quoted field across chunks, so that \n/\r\n
inside quoted fields doesn't end a row.
With the headers option, the first complete row is captured as the header
and used to decode every subsequent row as a map; no row is emitted for the
header itself.
Example
1> D0 = glazer_csv:stream_decoder(),
2> {Rows1, D1} = glazer_csv:stream_feed(D0, <<"a,b\n1,2\n3,">>),
3> Rows1.
[[<<"a">>,<<"b">>],[<<"1">>,<<"2">>]]
4> {Rows2, D2} = glazer_csv:stream_feed(D1, <<"4\n">>),
5> Rows2.
[[<<"3">>,<<"4">>]]
6> glazer_csv:stream_eof(D2).
{ok, []}
-spec stream_decoder(decode_opts()) -> stream_decoder().
Create a new incremental CSV decoder, passing Opts through to every
decode/2 call.
-spec stream_eof(stream_decoder()) -> {ok, [[binary()]] | [map()]} | {error, term()}.
Signal end-of-stream: decode any remaining buffered bytes as a final row (useful when the input doesn't end with a trailing line break).
Returns {ok, Rows} with zero or one trailing row, or {error, Reason} if
the remaining bytes don't form a valid row.
Example
1> D0 = glazer_csv:stream_decoder(),
2> {Rows1, D1} = glazer_csv:stream_feed(D0, <<"a,b\n1,2">>),
3> Rows1.
[[<<"a">>,<<"b">>]]
4> glazer_csv:stream_eof(D1).
{ok, [[<<"1">>,<<"2">>]]}
-spec stream_feed(stream_decoder(), binary() | iolist()) -> {[[binary()]] | [map()], stream_decoder()}.
Feed a chunk of bytes into the decoder, returning any complete CSV rows found so far (in order) along with the updated decoder.
Raises the same exceptions as decode/2 if a row that
the scanner deemed complete fails to decode.
Example
loop(Socket, D0) ->
case gen_tcp:recv(Socket, 0) of
{ok, Chunk} ->
{Rows, D1} = glazer_csv:stream_feed(D0, Chunk),
handle_rows(Rows),
loop(Socket, D1);
{error, closed} ->
case glazer_csv:stream_eof(D0) of
{ok, Trailing} -> handle_rows(Trailing);
{error, Reason} -> handle_truncated_stream(Reason)
end
end.
-spec try_decode(binary() | iolist()) -> {ok, [[binary()]]} | {error, decode_error()}.
Decode a CSV binary or iolist, returning {ok, Rows} or
{error, Reason} instead of raising, where Reason is a
decode_error().
-spec try_decode(binary() | iolist(), decode_opts()) -> {ok, [[binary()]] | [map()]} | {error, decode_error()}.
Decode a CSV binary or iolist with options, returning {ok, Rows} or
{error, Reason} instead of raising, where Reason is a
decode_error().
-spec write_file(file:name_all(), [[term()]] | [map()]) -> ok.
Encode Data to CSV and write it to Filename, overwriting any existing
file.
Raises a binary "Filename: Reason" message (see file:format_error/1)
if the file can't be written.
Example
1> glazer_csv:write_file("data.csv", [[<<"a">>,<<"b">>],[1,2]]).
ok
-spec write_file(file:name_all(), [[term()]] | [map()], encode_opts()) -> ok.
Encode Data to CSV with encode options (see encode/2) and write it to
Filename, overwriting any existing file.