Exgit.LFS (exgit v0.1.0)

Copy Markdown View Source

Git LFS (Large File Storage) pointer detection.

Git LFS replaces large binary blobs in the object database with small text "pointer" files of the form:

version https://git-lfs.github.com/spec/v1
oid sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393
size 12345

The actual content lives on an LFS server and is fetched via a separate protocol (batch API over HTTPS). An agent reading the blob through normal git-object APIs sees the ~130-byte pointer text, not the file contents — a silent correctness cliff if the agent doesn't know to check.

Exgit.FS.read_path/4 with detect_lfs_pointers: true uses parse/1 here to surface detected pointers as a structured {:lfs_pointer, info} tuple instead of returning the pointer text as if it were content. Detection is the whole story — fetching the pointed-to content from the LFS server is left to the caller.

Strictness

parse/1 matches git lfs pointer --check behavior:

  • Input must be ≤ @max_pointer_bytes (1024 bytes). Real LFS pointers are ~130 bytes; the cap rejects regular blobs that happen to start with "version https://...".
  • First line must be exactly version https://git-lfs.github.com/spec/v1\n.
  • Subsequent lines must be <key> <value>\n pairs, keys in ASCII-sorted order, values non-empty.
  • oid and size are required; oid must match sha256:<64 hex> and size must be a non-negative decimal integer.
  • Unknown keys are permitted only if they match ext-N-<name> where N is a decimal. Unknown non-ext keys cause rejection.
  • Input must end with \n.

The tight matching keeps false positives near zero — a regular text blob that coincidentally starts with the version line still has to satisfy every other constraint.

Summary

Functions

Parse data as a git-lfs pointer file.

Predicate form of parse/1. Returns true iff data is a valid LFS pointer.

Types

pointer_info()

@type pointer_info() :: %{oid: String.t(), size: non_neg_integer(), raw: binary()}

Functions

parse(data)

@spec parse(binary()) :: {:ok, pointer_info()} | {:error, term()}

Parse data as a git-lfs pointer file.

Returns {:ok, %{oid, size, raw}} on a valid pointer, or {:error, reason} on any rejection.

The raw field is the original bytes unchanged — callers that want to re-emit the pointer (e.g. to pass through to a layer that knows how to fetch from the LFS server) can use it without having to reserialize.

Examples

iex> ptr = """
...> version https://git-lfs.github.com/spec/v1
...> oid sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393
...> size 12345
...> """
iex> {:ok, info} = Exgit.LFS.parse(ptr)
iex> info.size
12345
iex> info.oid
"sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393"

iex> Exgit.LFS.parse("not an lfs pointer")
{:error, :not_lfs_pointer}

pointer?(data)

@spec pointer?(binary()) :: boolean()

Predicate form of parse/1. Returns true iff data is a valid LFS pointer.