packkit/gzip
RFC 1952 gzip codec.
gzip wraps a DEFLATE stream in a member header that may carry an original filename, free-text comment, and modification timestamp. A trailing CRC-32 and ISIZE pair lets readers verify the decoded payload.
Types
Incremental decoder state. Buffers the input chunks and runs the
one-shot decoder at finish time. Streaming is presented as
“feed-and-finalize” rather than “produce a partial output per
push” because the underlying DEFLATE decoder is eager — but the
API still lets callers wire incremental pipelines from sources
that hand them data in chunks. See packkit/stream for the
codec-neutral version of this surface.
buffered_bytes is tracked so push can enforce max_input_bytes
incrementally: a hostile or buggy producer that streams ever-larger
chunks can no longer overrun the limit silently by accumulating in
the decoder before finish runs.
pub opaque type Decoder
Why a checked header constructor rejected an argument.
pub type HeaderError {
HeaderNameContainsNul
HeaderCommentContainsNul
HeaderModifiedAtOutOfRange(value: Int)
HeaderExtraSubfieldTooLong(actual: Int)
HeaderExtraTotalTooLong(actual: Int)
HeaderExtraSubfieldIdOutOfRange(id_1: Int, id_2: Int)
}
Constructors
-
HeaderNameContainsNul -
HeaderCommentContainsNul -
HeaderModifiedAtOutOfRange(value: Int)modified_at_unixmust fit in gzip’s 32-bit MTIME field (0..2^32-1). Surfaced here rather than silently wrapping atencodetime. -
HeaderExtraSubfieldTooLong(actual: Int)A single FEXTRA subfield must fit gzip’s 16-bit LEN field; the entire FEXTRA region must also fit gzip’s 16-bit XLEN field. Either overflow surfaces here at
with_extra_checkedtime rather than silently truncating the data insideencode. -
HeaderExtraTotalTooLong(actual: Int) -
HeaderExtraSubfieldIdOutOfRange(id_1: Int, id_2: Int)FEXTRA subfield IDs are two bytes; values outside
0..255cannot be packed into a single byte each.
One FEXTRA subfield (RFC 1952 §2.3.1.1). id_1 and id_2 are
the two ASCII bytes that name the subfield (per the spec they
SHOULD be a recognised registry entry but the format does not
enforce that); data is the subfield body (up to 65 535 bytes).
Each subfield is encoded as <id_1, id_2, LEN(LE 16), data>,
and the full FEXTRA region begins with the 16-bit little-endian
total length of all subfields concatenated.
pub type Subfield {
Subfield(id_1: Int, id_2: Int, data: BitArray)
}
Constructors
-
Subfield(id_1: Int, id_2: Int, data: BitArray)
Values
pub fn comment(header: Header) -> option.Option(String)
Read the optional comment field.
pub fn decode(
bytes bytes: BitArray,
) -> Result(Decoded, error.CodecError)
Decode a gzip byte stream using default limits and return the rich [Decoded] record (header + payload).
decode is asymmetric with [encode]: encode takes payload bytes
and emits a stream, while decode returns both the payload and the
header. The asymmetry is intentional — gzip is the only codec in
the package that carries meaningful per-stream metadata (filename,
comment, mtime), and surfacing it on the decode side is what makes
decode |> .header useful. When you only care about the payload
and want the shape every other codec uses (BitArray -> Result(BitArray, _)), use [decode_payload].
pub fn decode_payload(
bytes bytes: BitArray,
) -> Result(BitArray, error.CodecError)
Decode a gzip byte stream and return only the payload bytes.
Parallels every other codec’s decode/1, which returns
Result(BitArray, _) — use this when you don’t need the gzip
header (mtime / filename / comment).
Law: decode_payload(b) == decode(b) |> result.map(fn(d) { d.payload }).
pub fn decode_payload_with_limits(
bytes bytes: BitArray,
limits limits: limit.Limits,
) -> Result(BitArray, error.CodecError)
Like [decode_payload] but accepts an explicit Limits value.
pub fn decode_with_limits(
bytes bytes: BitArray,
limits limits: limit.Limits,
) -> Result(Decoded, error.CodecError)
Decode a gzip byte stream using explicit limits.
Handles multi-member streams (RFC 1952 §2.2 — concatenated gzip
files such as those produced by cat a.gz b.gz). The returned
Decoded carries the header from the FIRST member and the
concatenated payload of every member that decoded successfully;
no other gzip API exposes per-member headers yet.
pub fn default_header() -> Header
Default gzip header with no optional fields populated.
pub fn encode(
bytes bytes: BitArray,
) -> Result(BitArray, error.CodecError)
Encode bytes as a gzip stream using the default header (no
FNAME / FCOMMENT / FEXTRA / mtime). Use this when you just want
“compress these bytes” — the symmetric counterpart of
decode_payload, mirroring every other codec’s encode/1 shape.
Use [encode_with_header] when you need to attach a filename,
comment, or mtime to the stream.
pub fn encode_with_header(
bytes bytes: BitArray,
header header: Header,
) -> Result(BitArray, error.CodecError)
Encode bytes as a gzip stream using header. The DEFLATE body
uses the dynamic-Huffman encoder, which on typical text and
structured-data payloads shrinks ~10–30 % more than the fixed-
Huffman variant; for pathologically skewed inputs the encoder
transparently falls back to fixed Huffman inside
deflate.encode_dynamic so the stream is always a valid
RFC 1951 BTYPE=01 or BTYPE=10 block.
pub fn extra(header: Header) -> List(Subfield)
Read the FEXTRA subfields. Empty when the gzip header carries no FEXTRA region.
pub fn finish(
decoder: Decoder,
) -> Result(BitArray, error.CodecError)
Finalize the decoder and return the full decoded payload.
Returns a bare BitArray (not List(BitArray)) so the gzip
streaming surface matches packkit/stream exactly.
pub fn modified_at_unix(header: Header) -> option.Option(Int)
Read the optional mtime field.
pub fn new_decoder() -> Decoder
Create a new incremental decoder state using the default limits.
pub fn new_decoder_with_limits(limits: limit.Limits) -> Decoder
Create a new incremental decoder state with explicit limits.
pub fn push(
decoder: Decoder,
chunk: BitArray,
) -> Result(Decoder, error.CodecError)
Append a chunk of input bytes to the decoder, enforcing
max_input_bytes incrementally. Returns the updated decoder; no
output is produced until [finish] runs (the underlying DEFLATE
decoder is eager).
The shape mirrors [packkit/stream] so callers don’t have to remember
which streaming module returns which tuple — previously this push
returned (Decoder, List(BitArray)) and the equivalent
stream.push returned a bare Decoder.
pub fn with_comment(
header: Header,
comment comment: String,
) -> Header
Attach an optional comment. Panics if comment contains the NUL
byte gzip uses as the FCOMMENT terminator — see
[with_comment_checked] when the value comes from untrusted input.
Like [with_name], the stored value is exactly what the caller passed; earlier revisions silently stripped NULs.
pub fn with_comment_checked(
header: Header,
comment comment: String,
) -> Result(Header, HeaderError)
Attach an optional comment after validating that it does not contain the NUL byte gzip uses as the FCOMMENT terminator.
pub fn with_extra(
header: Header,
subfields subfields: List(Subfield),
) -> Header
Attach a list of FEXTRA subfields. Out-of-range IDs or overlong bodies panic; use [with_extra_checked] when the caller has not pre-validated the values.
pub fn with_extra_checked(
header: Header,
subfields subfields: List(Subfield),
) -> Result(Header, HeaderError)
Attach a list of FEXTRA subfields after validating that every
subfield ID byte fits the 8-bit slot, every subfield body fits
gzip’s 16-bit LEN, and the catenated total fits the 16-bit
XLEN. Returns a typed HeaderError on any of those overflows.
pub fn with_modified_at(
header: Header,
unix_seconds unix_seconds: Int,
) -> Header
Attach an optional Unix mtime. Out-of-range values panic at
construction time so a Header value cannot quietly carry a
timestamp gzip’s 32-bit MTIME field cannot represent. Use
[with_modified_at_checked] when the input is untrusted.
pub fn with_modified_at_checked(
header: Header,
unix_seconds unix_seconds: Int,
) -> Result(Header, HeaderError)
Attach an optional Unix mtime after validating it fits gzip’s 32-bit MTIME field.
pub fn with_name(header: Header, name name: String) -> Header
Attach an optional filename. Panics if name contains the NUL
byte gzip uses as the FNAME terminator — see [with_name_checked]
when the value comes from untrusted input.
The unchecked variant guarantees that the value stored in the
header is exactly what the caller passed (lawful round-trip via
name(with_name(h, x)) == Some(x)). Earlier revisions silently
stripped NULs to “be helpful”; that broke the round-trip law and
is now a panic, matching the other unchecked setters in this
module ([with_modified_at] / [with_extra]) and across the package
([packkit/entry.with_mode] etc.).
pub fn with_name_checked(
header: Header,
name name: String,
) -> Result(Header, HeaderError)
Attach an optional filename after validating that it does not contain the NUL byte gzip uses as the FNAME terminator. Use this when the value comes from untrusted input that must round-trip.