Git LFS (Large File Storage) pointer detection.
Git LFS replaces large binary blobs in the object database with small text "pointer" files of the form:
version https://git-lfs.github.com/spec/v1
oid sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393
size 12345The actual content lives on an LFS server and is fetched via a separate protocol (batch API over HTTPS). An agent reading the blob through normal git-object APIs sees the ~130-byte pointer text, not the file contents — a silent correctness cliff if the agent doesn't know to check.
Exgit.FS.read_path/4 with detect_lfs_pointers: true uses
parse/1 here to surface detected pointers as a structured
{:lfs_pointer, info} tuple instead of returning the pointer
text as if it were content. Detection is the whole story —
fetching the pointed-to content from the LFS server is left to
the caller.
Strictness
parse/1 matches git lfs pointer --check behavior:
- Input must be ≤
@max_pointer_bytes(1024 bytes). Real LFS pointers are ~130 bytes; the cap rejects regular blobs that happen to start with "version https://...". - First line must be exactly
version https://git-lfs.github.com/spec/v1\n. - Subsequent lines must be
<key> <value>\npairs, keys in ASCII-sorted order, values non-empty. oidandsizeare required;oidmust matchsha256:<64 hex>andsizemust be a non-negative decimal integer.- Unknown keys are permitted only if they match
ext-N-<name>where N is a decimal. Unknown non-ext keys cause rejection. - Input must end with
\n.
The tight matching keeps false positives near zero — a regular text blob that coincidentally starts with the version line still has to satisfy every other constraint.
Summary
Types
@type pointer_info() :: %{oid: String.t(), size: non_neg_integer(), raw: binary()}
Functions
@spec parse(binary()) :: {:ok, pointer_info()} | {:error, term()}
Parse data as a git-lfs pointer file.
Returns {:ok, %{oid, size, raw}} on a valid pointer, or
{:error, reason} on any rejection.
The raw field is the original bytes unchanged — callers that
want to re-emit the pointer (e.g. to pass through to a layer
that knows how to fetch from the LFS server) can use it
without having to reserialize.
Examples
iex> ptr = """
...> version https://git-lfs.github.com/spec/v1
...> oid sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393
...> size 12345
...> """
iex> {:ok, info} = Exgit.LFS.parse(ptr)
iex> info.size
12345
iex> info.oid
"sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393"
iex> Exgit.LFS.parse("not an lfs pointer")
{:error, :not_lfs_pointer}
Predicate form of parse/1. Returns true iff data is a
valid LFS pointer.