API Reference image_ocr v#0.2.0

Copy Markdown View Source

Modules

Idiomatic Elixir interface to the Tesseract OCR engine.

Normalises supported OCR inputs into a Vix.Vips.Image.t() and an associated raw pixel buffer suitable for handing to the Tesseract NIF.

Translation between user-facing language identifiers and Tesseract's trained-data filename codes.

A NimblePool-backed pool of Image.OCR instances for concurrent OCR.

Helpers for resolving and managing Tesseract trained-data (tessdata) files.

Mix Tasks

Downloads <language>.traineddata from the upstream tessdata_* GitHub repository into the configured trained-data directory.

Lists every <language>.traineddata file in the resolved trained-data directory along with provenance from the manifest.

Deletes one or more <language>.traineddata files and their manifest entries.

Re-fetches every trained-data file recorded in the manifest, picking up the latest commit on each language's recorded branch.