PB is a data-driven library: it interprets a compiled schema at runtime rather
than generating bespoke encode/decode code for each message. The Elixir
protobuf library takes the opposite
approach, generating a dedicated module per message with protoc-gen-elixir.
That design difference has a performance cost, and PB is the slower of the two.
On the scenarios below, encoding runs roughly 1.4–1.8× slower than
protobuf and decoding roughly 2.5–3.2× slower, with several times the
memory allocated per operation. PB decodes into plain maps and walks the schema
to do it; protobuf materializes a struct it has generated code for. If raw
throughput is the deciding factor, protobuf is faster. PB trades that speed
for its data-driven model — no code generation, no build step, schemas that
travel as data.
The numbers below compare PB's compile-time path (use PB.Schema, the direct
analogue of generated modules) against protobuf's generated modules.
Wire throughput is only one axis, though. On compile time and runtime memory the data-driven model is the clear winner — see Compile time and runtime footprint below.
Methodology
Both libraries encode and decode the same schema
(bench/proto/pb_bench.proto)
and the same payloads, across four scenarios:
- person/full — a nested message with repeated fields, a oneof, and a map.
- person/sparse — the same message with a single field set.
- scalars — one field of every scalar wire type.
- packed — long repeated numeric fields (packed encoding).
Before timing, the suite runs a correctness gate: each library decodes the other's bytes and the results must match, so the comparison is only ever between encoders that agree on the wire.
PB's native encode output is iodata; the "binary" rows add
IO.iodata_to_binary/1 to match protobuf, which returns a binary.
Results
Captured by bench/run.sh
on Benchee. Absolute timings depend on hardware; the ratios are the stable part.
ok person/full nested repeated map (byte-identical, 479 B)
ok person/sparse defaults (byte-identical, 2 B)
ok scalars/all scalar wire types (byte-identical, 161 B)
ok packed/repeated numerics (byte-identical, 2884 B)
=== ENCODE (Elixir term -> wire bytes) ===
Operating System: macOS
CPU Information: Apple M1 Max
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.5
Erlang 28.4.2
JIT enabled: true
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: packed/repeated numerics, person/full nested repeated map, person/sparse defaults, scalars/all scalar wire types
Estimated total run time: 1 min 48 s
Excluding outliers: false
##### With input packed/repeated numerics #####
Name ips average deviation median 99th %
protobuf encode (binary) 20.34 K 49.18 μs ±12.91% 51.29 μs 65.96 μs
PB encode (iodata) 15.05 K 66.46 μs ±33.73% 61.83 μs 115.21 μs
PB encode (binary) 13.46 K 74.29 μs ±14.60% 70 μs 119.33 μs
Comparison:
protobuf encode (binary) 20.34 K
PB encode (iodata) 15.05 K - 1.35x slower +17.29 μs
PB encode (binary) 13.46 K - 1.51x slower +25.11 μs
Memory usage statistics:
Name Memory usage
protobuf encode (binary) 92.23 KB
PB encode (iodata) 220.85 KB - 2.39x memory usage +128.63 KB
PB encode (binary) 220.91 KB - 2.40x memory usage +128.69 KB
**All measurements for memory usage were the same**
##### With input person/full nested repeated map #####
Name ips average deviation median 99th %
protobuf encode (binary) 62.36 K 16.03 μs ±17.23% 15.71 μs 18.88 μs
PB encode (iodata) 38.37 K 26.06 μs ±46.44% 24.88 μs 37.91 μs
PB encode (binary) 34.24 K 29.20 μs ±46.90% 27.96 μs 43.85 μs
Comparison:
protobuf encode (binary) 62.36 K
PB encode (iodata) 38.37 K - 1.63x slower +10.03 μs
PB encode (binary) 34.24 K - 1.82x slower +13.17 μs
Memory usage statistics:
Name Memory usage
protobuf encode (binary) 24.68 KB
PB encode (iodata) 68.55 KB - 2.78x memory usage +43.87 KB
PB encode (binary) 68.58 KB - 2.78x memory usage +43.90 KB
**All measurements for memory usage were the same**
##### With input person/sparse defaults #####
Name ips average deviation median 99th %
PB encode (iodata) 2.00 M 500.77 ns ±1544.06% 458 ns 625 ns
PB encode (binary) 1.80 M 554.33 ns ±1561.02% 500 ns 709 ns
protobuf encode (binary) 1.39 M 721.96 ns ±853.52% 667 ns 1792 ns
Comparison:
PB encode (iodata) 2.00 M
PB encode (binary) 1.80 M - 1.11x slower +53.57 ns
protobuf encode (binary) 1.39 M - 1.44x slower +221.20 ns
Memory usage statistics:
Name Memory usage
PB encode (iodata) 896 B
PB encode (binary) 920 B - 1.03x memory usage +24 B
protobuf encode (binary) 488 B - 0.54x memory usage -408 B
**All measurements for memory usage were the same**
##### With input scalars/all scalar wire types #####
Name ips average deviation median 99th %
protobuf encode (binary) 359.18 K 2.78 μs ±262.49% 2.67 μs 4.08 μs
PB encode (iodata) 253.14 K 3.95 μs ±216.15% 3.83 μs 4.88 μs
PB encode (binary) 209.79 K 4.77 μs ±530.12% 4.25 μs 8.38 μs
Comparison:
protobuf encode (binary) 359.18 K
PB encode (iodata) 253.14 K - 1.42x slower +1.17 μs
PB encode (binary) 209.79 K - 1.71x slower +1.98 μs
Memory usage statistics:
Name Memory usage
protobuf encode (binary) 3.29 KB
PB encode (iodata) 6.74 KB - 2.05x memory usage +3.45 KB
PB encode (binary) 6.80 KB - 2.07x memory usage +3.52 KB
**All measurements for memory usage were the same**
=== DECODE (wire bytes -> Elixir term) ===
Operating System: macOS
CPU Information: Apple M1 Max
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.19.5
Erlang 28.4.2
JIT enabled: true
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: packed/repeated numerics, person/full nested repeated map, person/sparse defaults, scalars/all scalar wire types
Estimated total run time: 1 min 48 s
Excluding outliers: false
##### With input packed/repeated numerics #####
Name ips average deviation median 99th %
protobuf decode 50.96 K 19.62 μs ±11.21% 18.88 μs 24.13 μs
PB decode (no defaults) 20.43 K 48.96 μs ±28.42% 47.71 μs 63.29 μs
PB decode (defaults) 20.12 K 49.71 μs ±5.51% 48.96 μs 57.25 μs
Comparison:
protobuf decode 50.96 K
PB decode (no defaults) 20.43 K - 2.49x slower +29.34 μs
PB decode (defaults) 20.12 K - 2.53x slower +30.08 μs
Memory usage statistics:
Name Memory usage
protobuf decode 38.91 KB
PB decode (no defaults) 141.15 KB - 3.63x memory usage +102.23 KB
PB decode (defaults) 141.34 KB - 3.63x memory usage +102.42 KB
**All measurements for memory usage were the same**
##### With input person/full nested repeated map #####
Name ips average deviation median 99th %
protobuf decode 66.73 K 14.99 μs ±19.79% 14.58 μs 16.96 μs
PB decode (no defaults) 21.15 K 47.27 μs ±6.24% 46.66 μs 56.08 μs
PB decode (defaults) 20.82 K 48.04 μs ±7.84% 47.54 μs 55.79 μs
Comparison:
protobuf decode 66.73 K
PB decode (no defaults) 21.15 K - 3.15x slower +32.28 μs
PB decode (defaults) 20.82 K - 3.21x slower +33.05 μs
Memory usage statistics:
Name Memory usage
protobuf decode 20.02 KB
PB decode (no defaults) 150.09 KB - 7.50x memory usage +130.07 KB
PB decode (defaults) 152.45 KB - 7.61x memory usage +132.43 KB
**All measurements for memory usage were the same**
##### With input person/sparse defaults #####
Name ips average deviation median 99th %
protobuf decode 5.06 M 197.46 ns ±3893.58% 167 ns 250 ns
PB decode (no defaults) 1.85 M 540.68 ns ±1593.55% 500 ns 708 ns
PB decode (defaults) 1.18 M 848.40 ns ±827.15% 792 ns 1041 ns
Comparison:
protobuf decode 5.06 M
PB decode (no defaults) 1.85 M - 2.74x slower +343.22 ns
PB decode (defaults) 1.18 M - 4.30x slower +650.94 ns
Memory usage statistics:
Name Memory usage
protobuf decode 0.133 KB
PB decode (no defaults) 1.31 KB - 9.88x memory usage +1.18 KB
PB decode (defaults) 2.23 KB - 16.76x memory usage +2.09 KB
**All measurements for memory usage were the same**
##### With input scalars/all scalar wire types #####
Name ips average deviation median 99th %
protobuf decode 505.30 K 1.98 μs ±14.31% 1.96 μs 2.29 μs
PB decode (defaults) 172.74 K 5.79 μs ±111.05% 5.67 μs 7.08 μs
PB decode (no defaults) 172.62 K 5.79 μs ±77.69% 5.63 μs 7.21 μs
Comparison:
protobuf decode 505.30 K
PB decode (defaults) 172.74 K - 2.93x slower +3.81 μs
PB decode (no defaults) 172.62 K - 2.93x slower +3.81 μs
Memory usage statistics:
Name Memory usage
protobuf decode 3.28 KB
PB decode (defaults) 19.16 KB - 5.84x memory usage +15.88 KB
PB decode (no defaults) 18.71 KB - 5.70x memory usage +15.43 KB
**All measurements for memory usage were the same**Compile time and runtime footprint
The encode/decode numbers above are the axis where protobuf wins. There is
another axis, not captured by a microbenchmark, where the data-driven model wins
decisively: the cost of the schema itself.
protobuf generates one Elixir module per message. A schema with thousands of
messages becomes thousands of modules to compile, and that cost is paid on every
build — in large schemas compilation can stretch into minutes. Those modules
also stay resident: each loaded BEAM module carries runtime overhead (its code,
atoms, and metadata) for the life of the VM, so a large generated schema has a
standing memory cost independent of how much data you actually encode.
PB compiles a schema into data. With use PB.Schema the whole schema becomes
one module holding a single compiled structure; with the runtime PB.compile/1
path there is no generated module at all. Either way, compile time is roughly
independent of the number of messages, and there is no per-message module loaded
at runtime. For large or rapidly-changing schemas this is often the more
consequential difference in practice, even though it does not show up in the
per-operation timings above.
Reproducing
The benchmark lives in its own project under bench/ so the main pb package
stays dependency-free. To run it:
cd bench
mix run pb_vs_protobuf.exs
Timing knobs (seconds): BENCH_WARMUP, BENCH_TIME, BENCH_MEMORY.
To regenerate the proto artifacts, rerun the suite, and refresh the results block on this page in one step:
bench/run.sh