This page explains the conceptual model PB upholds when it encodes and decodes — the why behind behaviours you will observe, such as when defaults appear and why validation runs where it does. It is independent of any particular runtime detail. For the byte-level type mapping, see the Data representation reference.
Layers PB keeps distinct
- Wire occurrences — raw field records in the byte stream. A field may occur zero, one, or many times.
- Logical message values — the value after occurrences are interpreted and merged.
- Presence — whether a field was actually present, kept separate from the field's default value.
- Presentation values — what callers receive, optionally with materialized defaults.
- PB metadata — reserved dunder keys (
:__unknown_fields__,:__extensions__,:__message_name__), never confused with real fields.
The central invariant: defaults must not erase presence. A materialized default is presentation, not a present value, so it never participates in merge, required-field checks, oneof selection, or presence-sensitive validation as if it had been on the wire.
Decode, conceptually
- Scan — read tags in wire order, tolerating any order, preserving unknown fields as raw bytes, and resolving known fields and extensions.
- Interpret & merge — apply protobuf merge semantics: singular scalars are last-write-wins, singular messages merge, repeated fields append (packed and unpacked both), oneofs keep one active member, map entries collapse with last-key-wins.
- Finalize — collapse map entries, wrap oneofs, check required fields, keep
presence distinct from defaults, and apply caller-facing defaults only if
defaults: truewas requested. - Validate — runs on the post-merge logical message, never on individual occurrences.
Validation after merge matters. These two wire occurrences:
child { name: "a" }
child { id: 123 }merge to %{child: %{name: "a", id: 123}}. If Child.id is required, validating
the first occurrence alone would wrongly reject the message; PB validates the
merged value.
Encode, conceptually
Encode has no merge stage — the caller supplies an already-logical value.
- Adapt — convert application values to protobuf logical values where adapters are configured (errors keep the logical field path).
- Validate — check map shape, required fields, scalar ranges, enum rules, oneof selection, and run protovalidate against the logical value.
- Elide & handle presence — implicit-presence defaults are omitted; explicit-presence fields encode even when equal to the default; empty repeated/map fields emit nothing; oneofs encode only the selected member. Presence is determined from the caller's input, not by comparing to a default.
- Construct wire bytes — correct tags and wire types, packed/expanded repeated encoding per schema, map entries as repeated messages, extensions as normal fields, unknown fields as preserved raw bytes.
Field order is not semantically meaningful; stable ordering is only useful for tests and reproducibility.
Further reading
The full semantic specification, including the internal scan/finalize model and
schema compile/prepare phases, lives in the repository design notes
(docs/semantics.md).