# Merkle Search Tree (MST) `Potable.MST` implements the AT Protocol Merkle Search Tree used for repository record indexing. Keys are strings (typically `collection/rkey`) and values are CIDs of record blocks. ## Purpose The MST provides: - ordered listing by key - efficient point lookup semantics - immutable, content-addressed snapshots - deterministic root hash for equivalent content ## Key Level Derivation Each key gets a level based on SHA-256: - hash the key - read 2-bit pairs from the most significant side - count leading zero pairs This produces fanout-4 style layering used by atproto MST. Use `Potable.MST.key_level/1` to inspect a key's level. ## Node Encoding Nodes are DAG-CBOR maps: ```text %{ "l" => left_subtree_cid, # optional "e" => [ %{ "p" => prefix_len, "k" => key_suffix_bytes, # CBOR byte string "v" => value_cid, "t" => right_subtree_cid # optional } ] } ``` Notes: - `"k"` uses prefix compression (delta against previous key in node) - empty/optional fields are omitted, not encoded as `nil` ## Public API Create/empty: - `empty/0` -> `nil` Mutations: - `put(root, key, value_cid, store)` -> `{:ok, new_root}` - `delete(root, key, store)` -> `{:ok, new_root_or_nil}` Reads: - `get(root, key, store)` -> `{:ok, cid}` or `:not_found` - `list(root, opts, store)` -> `[{key, cid}]` - `walk(root, store)` -> `[{key, cid}]` Diff and export helpers: - `diff(root_a, root_b, store)` -> `%{creates, updates, deletes}` - `collect_blocks(root, store)` -> list of MST node CIDs ## Listing Options `list/3` supports: - `prefix: "app.bsky.feed.post/"` - `limit: 100` - `after: "app.bsky.feed.post/rkey-001"` - `before: "app.bsky.feed.post/rkey-999"` Returned entries are lexicographically ordered by key. ## Determinism Potable tests assert: - same key/value set -> same root CID - insertion order does not change resulting root CID This property is required for reproducible repository snapshots. ## Current Complexity Characteristics Current implementation favors correctness and determinism: - `put/4` and `delete/3` rebuild from flattened pairs - `get/3` is tree traversal-based This keeps behavior predictable and testable, but not asymptotically optimized for very large trees yet. See `docs/08_known_limitations.md` for optimization roadmap notes.