# RustyJson Benchmarks Comprehensive benchmarks comparing RustyJson vs Jason across synthetic and real-world datasets. ## Key Findings 1. **Fast across all workloads** — plain data, struct-heavy data, and decoding (including deeply nested and small payloads) 2. **Encoding plain data** shows the largest gains — 3-6x faster, 2-3x less memory 3. **Struct encoding** optimized in v0.3.3 via single-pass iodata pipeline with compile-time codegen (~2x improvement over v0.3.2) 4. **Deep-nested decode** optimized in v0.3.3 via single-entry fast path (~27% faster than v0.3.2 for 100-level nested JSON) 5. **Larger payloads = bigger advantage** — real-world 10 MB files show better results than synthetic benchmarks 6. **BEAM scheduler load dramatically reduced** — 100-28,000x fewer reductions ## Test Environment | Attribute | Value | |-----------|-------| | OS | macOS | | CPU | Apple M1 Pro | | Cores | 10 | | Memory | 16 GB | | Elixir | 1.19.4 | | Erlang/OTP | 28.2 | ## Real-World Benchmarks: Amazon Settlement Reports These are production JSON files from Amazon SP-API settlement reports, representing real-world API response patterns with nested objects, arrays of transactions, and mixed data types. ### Encoding Performance (Elixir → JSON) | File Size | RustyJson | Jason | Speed | Memory | |-----------|-----------|-------|-------|--------| | 10.87 MB | 24 ms | 131 ms | **5.5x faster** | **2.7x less** | | 9.79 MB | 21 ms | 124 ms | **5.9x faster** | **2-3x less** | | 9.38 MB | 21 ms | 104 ms | **5.0x faster** | **2-3x less** | ### Decoding Performance (JSON → Elixir) | File Size | RustyJson | Jason | Speed | Memory | |-----------|-----------|-------|-------|--------| | 10.87 MB | 61 ms | 152 ms | **2.5x faster** | similar | | 9.79 MB | 55 ms | 134 ms | **2.4x faster** | similar | | 9.38 MB | 50 ms | 119 ms | **2.4x faster** | similar | ### BEAM Reductions (Scheduler Load) | File Size | RustyJson | Jason | Reduction | |-----------|-----------|-------|-----------| | 10.87 MB encode | 404 | 11,570,847 | **28,641x fewer** | This is the most dramatic difference - RustyJson offloads virtually all work to native code. ## Synthetic Benchmarks: nativejson-benchmark Using standard datasets from [nativejson-benchmark](https://github.com/miloyip/nativejson-benchmark): | Dataset | Size | Description | |---------|------|-------------| | canada.json | 2.1 MB | Geographic coordinates (number-heavy) | | citm_catalog.json | 1.6 MB | Event catalog (mixed types) | | twitter.json | 617 KB | Social media with CJK (unicode-heavy) | ### Decode Performance (JSON → Elixir) | Input | RustyJson ips | Average | |-------|--------------|---------| | canada.json (2.1 MB) | 153 | 6.55 ms | | citm_catalog.json (1.6 MB) | 323 | 3.09 ms | | twitter.json (617 KB) | 430 | 2.33 ms | | large_list (50k items, 2.3 MB) | 62 | 16.0 ms | | deep_nested (1.1 KB, 100 levels) | 148K | 6.75 µs | | wide_object (75 KB, 5k keys) | 1,626 | 0.61 ms | ### Roundtrip Performance (Decode + Encode) | Input | RustyJson | Jason | Speedup | |-------|-----------|-------|---------| | canada.json | 14 ms | 48 ms | **3.4x faster** | | citm_catalog.json | 6 ms | 14 ms | **2.5x faster** | | twitter.json | 4 ms | 9 ms | **2.3x faster** | ### BEAM Reductions by Dataset | Dataset | RustyJson | Jason | Ratio | |---------|-----------|-------|-------| | canada.json | ~3,500 | ~964,000 | **275x fewer** | | citm_catalog.json | ~300 | ~621,000 | **2,000x fewer** | | twitter.json | ~2,000 | ~511,000 | **260x fewer** | ## Struct Encoding Benchmarks (v0.3.3+) Encoding data that contains Elixir structs (e.g., `@derive RustyJson.Encoder` or custom `defimpl`) follows a different path than plain maps and lists. Structs require the `RustyJson.Encoder` protocol to convert them to JSON-serializable forms. In v0.3.3, the struct encoding pipeline was rewritten from a three-pass approach (protocol dispatch → fragment resolution → NIF serialization) to a single-pass iodata pipeline with compile-time codegen for derived structs. This closed the last remaining performance gap, making RustyJson faster across all encoding workloads. ### Struct Encoding Performance | Workload | Speedup (v0.3.3 vs v0.3.2) | |----------|----------------------------| | Derived struct (5 fields) | ~2x faster | | Derived struct (10 fields) | ~2x faster | | Custom encoder (returning `Encode.map`) | ~2.5x faster | | List of 1,000 derived structs | ~2x faster | | Nested structs (3 levels deep) | ~2x faster | Measured with protocol consolidation enabled (`MIX_ENV=prod`), which is the default for production builds. ### How It Works RustyJson's struct encoding produces iodata in a single pass: 1. **Derived encoders** (`@derive RustyJson.Encoder`) generate compile-time iodata templates with pre-escaped keys — no runtime `Map.from_struct`, `Map.to_list`, or key escaping. 2. **Map/List impls** detect struct-containing data and route through `Encode.map/2` / `Encode.list/2` to build iodata directly, wrapped in a `Fragment`. 3. **NIF bypass** — When the top-level result is an iodata Fragment (no pretty-print or compression), `IO.iodata_to_binary/1` is used directly, avoiding Erlang↔Rust term conversion entirely. For plain data (no structs), encoding still uses the fast Rust NIF path unchanged. ## Why Encoding Shows Bigger Gains ### iolist Encoding Pattern (Pure Elixir) ``` encode(data) → allocate "{" binary → allocate "\"key\"" binary → allocate ":" binary → allocate "\"value\"" binary → allocate list cells to link them → return iolist (many BEAM allocations) ``` ### RustyJson's Encoding Pattern (NIF) ``` encode(data) → [Rust: walk terms, write to single buffer] → copy buffer to BEAM binary → return binary (one BEAM allocation) ``` Pure-Elixir encoders create many small BEAM allocations. RustyJson creates one. ### Why Decoding Memory is Similar Both libraries produce identical Elixir data structures when decoding. The resulting maps, lists, and strings take the same space regardless of which library created them. ## Why Benchee Memory Measurements Don't Work for NIFs **Important**: Benchee's `memory_time` option gives misleading results for NIF-based libraries. ### What Benchee Reports (Incorrect) ``` | Library | Memory | |-----------|-----------| | RustyJson | 0.00169 MB | | Jason | 20.27 MB | ``` This suggests 12,000x less memory - which is wrong. ### Why This Happens Benchee measures memory using `:erlang.memory/0`, which only tracks BEAM allocations: - BEAM process heap - BEAM binary space - ETS tables RustyJson allocates memory in **Rust via mimalloc**, completely invisible to BEAM tracking. The 0.00169 MB is just NIF call overhead. ### How We Measure Instead We use `:erlang.memory(:total)` delta in isolated spawned processes: ```elixir spawn(fn -> :erlang.garbage_collect() before = :erlang.memory(:total) results = for _ <- 1..10, do: RustyJson.encode!(data) after_mem = :erlang.memory(:total) # Report (after_mem - before) / 10 end) ``` This captures BEAM allocations during the operation. For total system memory (including NIF), we verified with RSS measurements that Rust adds only ~1-2 MB temporary overhead. ### Actual Memory Comparison For a 10 MB settlement report encode: | Metric | RustyJson | Jason | |--------|-----------|-------| | BEAM memory | 6.7 MB | 17.9 MB | | NIF overhead | ~1-2 MB | N/A | | **Total** | **~8 MB** | **~18 MB** | | **Ratio** | | **2-3x less** | ## Running Benchmarks ```bash # 1. Download synthetic test data mkdir -p bench/data && cd bench/data curl -LO https://raw.githubusercontent.com/miloyip/nativejson-benchmark/master/data/canada.json curl -LO https://raw.githubusercontent.com/miloyip/nativejson-benchmark/master/data/citm_catalog.json curl -LO https://raw.githubusercontent.com/miloyip/nativejson-benchmark/master/data/twitter.json cd ../.. # 2. Run memory benchmarks (no extra deps needed) mix run bench/memory_bench.exs # 3. (Optional) Run speed benchmarks with Benchee # Add to mix.exs: {:benchee, "~> 1.0", only: :dev} mix deps.get mix run bench/stress_bench.exs ``` ## Key Interning Benchmarks The `keys: :intern` option provides significant speedups when decoding arrays of objects with repeated keys (common in API responses, database results, etc.). ### When Key Interning Helps: Homogeneous Arrays Arrays where every object has the same keys: ```json [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}, ...] ``` | Scenario | Default | `keys: :intern` | Improvement | |----------|---------|-----------------|-------------| | 100 objects × 5 keys | 34.2 µs | 23.6 µs | **31% faster** | | 100 objects × 10 keys | 67.5 µs | 44.8 µs | **34% faster** | | 1,000 objects × 5 keys | 335 µs | 237 µs | **29% faster** | | 1,000 objects × 10 keys | 688 µs | 463 µs | **33% faster** | | 10,000 objects × 5 keys | 3.46 ms | 2.45 ms | **29% faster** | | 10,000 objects × 10 keys | 6.92 ms | 4.88 ms | **29% faster** | ### When Key Interning Hurts: Unique Keys Single objects or heterogeneous arrays where keys aren't repeated: | Scenario | Default | `keys: :intern` | Penalty | |----------|---------|-----------------|---------| | Single object, 100 keys | 5.1 µs | 13.6 µs | **2.6x slower** | | Single object, 1,000 keys | 52 µs | 169 µs | **3.2x slower** | | Single object, 5,000 keys | 260 µs | 831 µs | **3.2x slower** | | Heterogeneous 100 objects | 35 µs | 96 µs | **2.7x slower** | | Heterogeneous 500 objects | 186 µs | 475 µs | **2.5x slower** | ### Scaling: Benefit Increases with Object Count With 5 keys per object, the benefit grows as more objects reuse the cached keys: | Objects | Default | `keys: :intern` | Improvement | |---------|---------|-----------------|-------------| | 10 | 3.5 µs | 3.0 µs | 13% faster | | 50 | 17.1 µs | 12.5 µs | 27% faster | | 100 | 33.8 µs | 23.8 µs | 30% faster | | 500 | 170 µs | 119 µs | 30% faster | | 1,000 | 339 µs | 242 µs | 29% faster | | 5,000 | 1.81 ms | 1.24 ms | 31% faster | | 10,000 | 3.47 ms | 2.49 ms | 28% faster | ### Usage Recommendation ```elixir # API responses, database results, bulk data RustyJson.decode!(json, keys: :intern) # Config files, single objects, unknown schemas RustyJson.decode!(json) # default, no interning ``` **Rule of thumb**: Use `keys: :intern` when you know you're decoding arrays of 10+ objects with the same schema. **Note**: Keys containing escape sequences (e.g., `"field\nname"`) are not interned because the raw JSON bytes differ from the decoded string. This is rare in practice and has negligible performance impact. ## Summary | Operation | Speed | Memory | Reductions | |-----------|-------|--------|------------| | **Encode plain data (large)** | 5-6x | 2-3x less | 28,000x fewer | | **Encode plain data (medium)** | 2-3x | 2-3x less | 200-2000x fewer | | **Encode structs (v0.3.3+)** | ~2x improvement over v0.3.2 | similar | — | | **Decode (large)** | 2-4.5x | similar | — | | **Decode (deep nested, v0.3.3+)** | ~27% improvement over v0.3.2 | similar | — | | **Decode (keys: :intern)** | +30%* | similar | — | *For arrays of objects with repeated keys (API responses, DB results, etc.) **Bottom line**: As of v0.3.3, RustyJson is fast across all encoding and decoding workloads, including deeply nested and small payloads. Plain data encoding shows the largest gains (5-6x, 2-3x less memory, dramatically fewer BEAM reductions). Struct encoding was rewritten in v0.3.3 with a single-pass iodata pipeline. Deep-nested decode was optimized in v0.3.3 with a single-entry fast path that avoids heap allocation for single-element objects and arrays. For decoding bulk data, enable `keys: :intern` for an additional 30% speedup.