packkit/gzip

RFC 1952 gzip codec.

gzip wraps a DEFLATE stream in a member header that may carry an original filename, free-text comment, and modification timestamp. A trailing CRC-32 and ISIZE pair lets readers verify the decoded payload.

Types

Decoded gzip stream.

pub type Decoded {
  Decoded(header: Header, payload: BitArray)
}

Constructors

  • Decoded(header: Header, payload: BitArray)

Incremental decoder state. Buffers the input chunks and runs the one-shot decoder at finish time. Streaming is presented as “feed-and-finalize” rather than “produce a partial output per push” because the underlying DEFLATE decoder is eager — but the API still lets callers wire incremental pipelines from sources that hand them data in chunks. See packkit/stream for the codec-neutral version of this surface.

buffered_bytes is tracked so push can enforce max_input_bytes incrementally: a hostile or buggy producer that streams ever-larger chunks can no longer overrun the limit silently by accumulating in the decoder before finish runs.

pub opaque type Decoder

Gzip header metadata.

pub opaque type Header

Why a checked header constructor rejected an argument.

pub type HeaderError {
  HeaderNameContainsNul
  HeaderCommentContainsNul
  HeaderModifiedAtOutOfRange(value: Int)
  HeaderExtraSubfieldTooLong(actual: Int)
  HeaderExtraTotalTooLong(actual: Int)
  HeaderExtraSubfieldIdOutOfRange(id_1: Int, id_2: Int)
}

Constructors

  • HeaderNameContainsNul
  • HeaderCommentContainsNul
  • HeaderModifiedAtOutOfRange(value: Int)

    modified_at_unix must fit in gzip’s 32-bit MTIME field (0..2^32-1). Surfaced here rather than silently wrapping at encode time.

  • HeaderExtraSubfieldTooLong(actual: Int)

    A single FEXTRA subfield must fit gzip’s 16-bit LEN field; the entire FEXTRA region must also fit gzip’s 16-bit XLEN field. Either overflow surfaces here at with_extra_checked time rather than silently truncating the data inside encode.

  • HeaderExtraTotalTooLong(actual: Int)
  • HeaderExtraSubfieldIdOutOfRange(id_1: Int, id_2: Int)

    FEXTRA subfield IDs are two bytes; values outside 0..255 cannot be packed into a single byte each.

One FEXTRA subfield (RFC 1952 §2.3.1.1). id_1 and id_2 are the two ASCII bytes that name the subfield (per the spec they SHOULD be a recognised registry entry but the format does not enforce that); data is the subfield body (up to 65 535 bytes).

Each subfield is encoded as <id_1, id_2, LEN(LE 16), data>, and the full FEXTRA region begins with the 16-bit little-endian total length of all subfields concatenated.

pub type Subfield {
  Subfield(id_1: Int, id_2: Int, data: BitArray)
}

Constructors

  • Subfield(id_1: Int, id_2: Int, data: BitArray)

Values

pub fn codec() -> codec.Codec

Gzip codec smart constructor.

pub fn comment(header: Header) -> option.Option(String)

Read the optional comment field.

pub fn decode(
  bytes bytes: BitArray,
) -> Result(Decoded, error.CodecError)

Decode a gzip byte stream using default limits and return the rich [Decoded] record (header + payload).

decode is asymmetric with [encode]: encode takes payload bytes and emits a stream, while decode returns both the payload and the header. The asymmetry is intentional — gzip is the only codec in the package that carries meaningful per-stream metadata (filename, comment, mtime), and surfacing it on the decode side is what makes decode |> .header useful. When you only care about the payload and want the shape every other codec uses (BitArray -> Result(BitArray, _)), use [decode_payload].

pub fn decode_payload(
  bytes bytes: BitArray,
) -> Result(BitArray, error.CodecError)

Decode a gzip byte stream and return only the payload bytes. Parallels every other codec’s decode/1, which returns Result(BitArray, _) — use this when you don’t need the gzip header (mtime / filename / comment).

Law: decode_payload(b) == decode(b) |> result.map(fn(d) { d.payload }).

pub fn decode_payload_with_limits(
  bytes bytes: BitArray,
  limits limits: limit.Limits,
) -> Result(BitArray, error.CodecError)

Like [decode_payload] but accepts an explicit Limits value.

pub fn decode_with_limits(
  bytes bytes: BitArray,
  limits limits: limit.Limits,
) -> Result(Decoded, error.CodecError)

Decode a gzip byte stream using explicit limits.

Handles multi-member streams (RFC 1952 §2.2 — concatenated gzip files such as those produced by cat a.gz b.gz). The returned Decoded carries the header from the FIRST member and the concatenated payload of every member that decoded successfully; no other gzip API exposes per-member headers yet.

pub fn default_header() -> Header

Default gzip header with no optional fields populated.

pub fn encode(
  bytes bytes: BitArray,
) -> Result(BitArray, error.CodecError)

Encode bytes as a gzip stream using the default header (no FNAME / FCOMMENT / FEXTRA / mtime). Use this when you just want “compress these bytes” — the symmetric counterpart of decode_payload, mirroring every other codec’s encode/1 shape. Use [encode_with_header] when you need to attach a filename, comment, or mtime to the stream.

pub fn encode_with_header(
  bytes bytes: BitArray,
  header header: Header,
) -> Result(BitArray, error.CodecError)

Encode bytes as a gzip stream using header. The DEFLATE body uses the dynamic-Huffman encoder, which on typical text and structured-data payloads shrinks ~10–30 % more than the fixed- Huffman variant; for pathologically skewed inputs the encoder transparently falls back to fixed Huffman inside deflate.encode_dynamic so the stream is always a valid RFC 1951 BTYPE=01 or BTYPE=10 block.

pub fn extra(header: Header) -> List(Subfield)

Read the FEXTRA subfields. Empty when the gzip header carries no FEXTRA region.

pub fn finish(
  decoder: Decoder,
) -> Result(BitArray, error.CodecError)

Finalize the decoder and return the full decoded payload.

Returns a bare BitArray (not List(BitArray)) so the gzip streaming surface matches packkit/stream exactly.

pub fn modified_at_unix(header: Header) -> option.Option(Int)

Read the optional mtime field.

pub fn name(header: Header) -> option.Option(String)

Read the optional filename field.

pub fn new_decoder() -> Decoder

Create a new incremental decoder state using the default limits.

pub fn new_decoder_with_limits(limits: limit.Limits) -> Decoder

Create a new incremental decoder state with explicit limits.

pub fn push(
  decoder: Decoder,
  chunk: BitArray,
) -> Result(Decoder, error.CodecError)

Append a chunk of input bytes to the decoder, enforcing max_input_bytes incrementally. Returns the updated decoder; no output is produced until [finish] runs (the underlying DEFLATE decoder is eager).

The shape mirrors [packkit/stream] so callers don’t have to remember which streaming module returns which tuple — previously this push returned (Decoder, List(BitArray)) and the equivalent stream.push returned a bare Decoder.

pub fn with_comment(
  header: Header,
  comment comment: String,
) -> Header

Attach an optional comment. Panics if comment contains the NUL byte gzip uses as the FCOMMENT terminator — see [with_comment_checked] when the value comes from untrusted input.

Like [with_name], the stored value is exactly what the caller passed; earlier revisions silently stripped NULs.

pub fn with_comment_checked(
  header: Header,
  comment comment: String,
) -> Result(Header, HeaderError)

Attach an optional comment after validating that it does not contain the NUL byte gzip uses as the FCOMMENT terminator.

pub fn with_extra(
  header: Header,
  subfields subfields: List(Subfield),
) -> Header

Attach a list of FEXTRA subfields. Out-of-range IDs or overlong bodies panic; use [with_extra_checked] when the caller has not pre-validated the values.

pub fn with_extra_checked(
  header: Header,
  subfields subfields: List(Subfield),
) -> Result(Header, HeaderError)

Attach a list of FEXTRA subfields after validating that every subfield ID byte fits the 8-bit slot, every subfield body fits gzip’s 16-bit LEN, and the catenated total fits the 16-bit XLEN. Returns a typed HeaderError on any of those overflows.

pub fn with_modified_at(
  header: Header,
  unix_seconds unix_seconds: Int,
) -> Header

Attach an optional Unix mtime. Out-of-range values panic at construction time so a Header value cannot quietly carry a timestamp gzip’s 32-bit MTIME field cannot represent. Use [with_modified_at_checked] when the input is untrusted.

pub fn with_modified_at_checked(
  header: Header,
  unix_seconds unix_seconds: Int,
) -> Result(Header, HeaderError)

Attach an optional Unix mtime after validating it fits gzip’s 32-bit MTIME field.

pub fn with_name(header: Header, name name: String) -> Header

Attach an optional filename. Panics if name contains the NUL byte gzip uses as the FNAME terminator — see [with_name_checked] when the value comes from untrusted input.

The unchecked variant guarantees that the value stored in the header is exactly what the caller passed (lawful round-trip via name(with_name(h, x)) == Some(x)). Earlier revisions silently stripped NULs to “be helpful”; that broke the round-trip law and is now a panic, matching the other unchecked setters in this module ([with_modified_at] / [with_extra]) and across the package ([packkit/entry.with_mode] etc.).

pub fn with_name_checked(
  header: Header,
  name name: String,
) -> Result(Header, HeaderError)

Attach an optional filename after validating that it does not contain the NUL byte gzip uses as the FNAME terminator. Use this when the value comes from untrusted input that must round-trip.

Search Document